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PREFACE  TO  THE  SECOND  EDITION 


In  the  years  since  publication  of  the  first  edition  of  Basic  Algebra,  many  readers 
have  reacted  to  the  book  by  sending  comments,  suggestions,  and  corrections. 
People  especially  approved  of  the  inclusion  of  some  linear  algebra  before  any 
group  theory,  and  they  liked  the  ideas  of  proceeding  from  the  particular  to  the 
general  and  of  giving  examples  of  computational  techniques  right  from  the  start. 
They  appreciated  the  overall  comprehensive  nature  of  the  book,  associating  this 
feature  with  the  large  number  of  problems  that  develop  so  many  sidelights  and 
applications  of  the  theory. 

Along  with  the  general  comments  and  specific  suggestions  were  corrections, 
and  there  were  enough  corrections,  perhaps  a  hundred  in  all,  so  that  a  second 
edition  now  seems  to  be  in  order.  Many  of  the  corrections  were  of  minor  matters, 
yet  readers  should  not  have  to  cope  with  errors  along  with  new  material.  Fortu¬ 
nately  no  results  in  the  first  edition  needed  to  be  deleted  or  seriously  modified, 
and  additional  results  and  problems  could  be  included  without  renumbering. 

For  the  first  edition,  the  author  granted  a  publishing  license  to  Birkhauser 
Boston  that  was  limited  to  print  media,  leaving  the  question  of  electronic  publi¬ 
cation  unresolved.  The  main  change  with  the  second  edition  is  that  the  question 
of  electronic  publication  has  now  been  resolved,  and  a  PDF  file,  called  the  “digital 
second  edition,”  is  being  made  freely  available  to  everyone  worldwide  for  personal 
use.  This  file  may  be  downloaded  from  the  author’s  own  Web  page  and  from 
elsewhere. 

The  main  changes  to  the  text  of  the  first  edition  of  Basic  Algebra  are  as  follows: 

•  The  corrections  sent  by  readers  and  by  reviewers  have  been  made.  The  most 
significant  such  correction  was  a  revision  to  the  proof  of  Zorn’s  Lemma,  the 
earlier  proof  having  had  a  gap. 

•  A  number  of  problems  have  been  added  at  the  ends  of  the  chapters,  most  of 
them  with  partial  or  full  solutions  added  to  the  section  of  Hints  at  the  back  of 
the  book.  Of  particular  note  are  problems  on  the  following  topics: 

(a)  (Chapter  II)  the  relationship  in  two  and  three  dimensions  between  deter¬ 
minants  and  areas  or  volumes, 

(b)  (Chapters  V  and  IX)  further  aspects  of  canonical  forms  for  matrices  and 
linear  mappings, 

(c)  (Chapter  VIII)  amplification  of  uses  of  the  Fundamental  Theorem  of 
Finitely  Generated  Modules  over  principal  ideal  domains, 
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(d)  (Chapter  IX)  the  interplay  of  extension  of  scalars  and  Galois  theory, 

(e)  (Chapter  IX)  properties  and  examples  of  ordered  fields  and  real  closed 
fields. 

•  Some  revisions  have  been  made  to  the  chapter  on  field  theory  (Chapter  IX). 
It  was  originally  expected,  and  it  continues  to  be  expected,  that  a  reader  who 
wants  a  fuller  treatment  of  fields  will  look  also  at  the  chapter  on  infinite 
field  extensions  in  Advanced  Algebra.  However,  the  original  placement  of  the 
break  between  volumes  left  some  possible  confusion  about  the  role  of  “normal 
extensions”  in  field  theory,  and  that  matter  has  now  been  resolved. 

•  Characteristic  polynomials  initially  have  a  variable  A  as  a  reminder  of  how 
they  arise  from  eigenvalues.  But  it  soon  becomes  important  to  think  of  them 
as  abstract  polynomials,  not  as  polynomial  functions.  The  indeterminate 
had  been  left  as  A  throughout  most  of  the  book  in  the  original  edition,  and 
some  confusion  resulted.  The  indeterminate  is  now  called  X  rather  than  A 
from  Chapter  V  on,  and  characteristic  polynomials  have  been  treated 
unambiguously  thereafter  as  abstract  polynomials. 

•  Occasional  paragraphs  have  been  added  that  point  ahead  to  material  in 
Advanced  Algebra. 

The  preface  to  the  first  edition  mentioned  three  themes  that  recur  throughout 
and  blend  together  at  times:  the  analogy  between  integers  and  polynomials  in 
one  variable  over  a  field,  the  interplay  between  linear  algebra  and  group  theory, 
and  the  relationship  between  number  theory  and  geometry.  A  fourth  is  the  gentle 
mention  of  notions  in  category  theory  to  tie  together  phenomena  that  occur  in 
different  areas  of  algebra;  an  example  of  such  a  notion  is  “universal  mapping 
property.”  Readers  will  benefit  from  looking  for  these  and  other  such  themes, 
since  recognizing  them  helps  one  get  a  view  of  the  whole  subject  at  once. 

It  was  Benjamin  Levitt,  Birkhauser  mathematics  editor  in  New  York,  who 
encouraged  the  writing  of  a  second  edition,  who  made  a  number  of  suggestions 
about  pursuing  it,  and  who  passed  along  comments  from  several  anonymous 
referees  about  the  strengths  and  weaknesses  of  the  book.  I  am  especially  grateful 
to  those  readers  who  have  sent  me  comments  over  the  years.  Many  corrections  and 
suggestions  were  kindly  pointed  out  to  the  author  by  Skip  Garibaldi  of  Emory 
University  and  Ario  Contact  of  Shiraz,  Iran.  The  long  correction  concerning 
Zorn’s  Lemma  resulted  from  a  discussion  with  Qiu  Ruyue.  The  typesetting  was 
done  by  the  program  Textures  using  AjyfS- TpX,  and  the  figures  were  drawn  with 
Mathematica. 

Just  as  with  the  first  edition,  I  invite  corrections  and  other  comments  from 
readers.  For  as  long  as  I  am  able,  I  plan  to  point  to  a  list  of  known  corrections 
from  my  own  Web  page,  www.math.stonybrook.edu/~aknapp. 

A.  W.  Knapp 
January  2016 


PREFACE  TO  THE  FIRST  EDITION 


Basic  Algebra  and  its  companion  volume  Advanced  Algebra  systematically  de¬ 
velop  concepts  and  tools  in  algebra  that  are  vital  to  every  mathematician,  whether 
pure  or  applied,  aspiring  or  established.  These  two  books  together  aim  to  give  the 
reader  a  global  view  of  algebra,  its  use,  and  its  role  in  mathematics  as  a  whole. 
The  idea  is  to  explain  what  the  young  mathematician  needs  to  know  about  algebra 
in  order  to  communicate  well  with  colleagues  in  all  branches  of  mathematics. 

The  books  are  written  as  textbooks,  and  their  primary  audience  is  students  who 
are  learning  the  material  for  the  first  time  and  who  are  planning  a  career  in  which 
they  will  use  advanced  mathematics  professionally.  Much  of  the  material  in  the 
books,  particularly  in  Basic  Algebra  but  also  in  some  of  the  chapters  of  Advanced 
Algebra,  corresponds  to  normal  course  work.  The  books  include  further  topics 
that  may  be  skipped  in  required  courses  but  that  the  professional  mathematician 
will  ultimately  want  to  learn  by  self-study.  The  test  of  each  topic  for  inclusion  is 
whether  it  is  something  that  a  plenary  lecturer  at  a  broad  international  or  national 
meeting  is  likely  to  take  as  known  by  the  audience. 

The  key  topics  and  features  of  Basic  Algebra  are  as  follows: 

•  Linear  algebra  and  group  theory  build  on  each  other  throughout  the  book. 
A  small  amount  of  linear  algebra  is  introduced  first,  as  the  topic  likely  to  be 
better  known  by  the  reader  ahead  of  time,  and  then  a  little  group  theory  is 
introduced,  with  linear  algebra  providing  important  examples. 

•  Chapters  on  linear  algebra  develop  notions  related  to  vector  spaces,  the 
theory  of  linear  transformations,  bilinear  forms,  classical  linear  groups,  and 
multilinear  algebra. 

•  Chapters  on  modern  algebra  treat  groups,  rings,  fields,  modules,  and  Galois 
groups,  including  many  uses  of  Galois  groups  and  methods  of  computation. 

•  Three  prominent  themes  recur  throughout  and  blend  together  at  times:  the 
analogy  between  integers  and  polynomials  in  one  variable  over  a  field,  the  in¬ 
terplay  between  linear  algebra  and  group  theory,  and  the  relationship  between 
number  theory  and  geometry. 

•  The  development  proceeds  from  the  particular  to  the  general,  often  introducing 
examples  well  before  a  theory  that  incorporates  them. 

•  More  than  400  problems  at  the  ends  of  chapters  illuminate  aspects  of  the 
text,  develop  related  topics,  and  point  to  additional  applications.  A  separate 
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90-page  section  “Hints  for  Solutions  of  Problems”  at  the  end  of  the  book  gives 
detailed  hints  for  most  of  the  problems,  complete  solutions  for  many. 

•  Applications  such  as  the  fast  Fourier  transform,  the  theory  of  linear  error- 
correcting  codes,  the  use  of  Jordan  canonical  form  in  solving  linear  systems 
of  ordinary  differential  equations,  and  constructions  of  interest  in  mathematical 
physics  arise  naturally  in  sequences  of  problems  at  the  ends  of  chapters  and 
illustrate  the  power  of  the  theory  for  use  in  science  and  engineering. 

Basic  Algebra  endeavors  to  show  some  of  the  interconnections  between 
different  areas  of  mathematics,  beyond  those  listed  above.  Here  are  examples: 
Systems  of  orthogonal  functions  make  an  appearance  with  inner-product  spaces. 
Covering  spaces  naturally  play  a  role  in  the  examination  of  subgroups  of  free 
groups.  Cohomology  of  groups  arises  from  considering  group  extensions.  Use 
of  the  power-series  expansion  of  the  exponential  function  combines  with  algebraic 
numbers  to  prove  that  it  is  transcendental.  Harmonic  analysis  on  a  cyclic  group 
explains  the  mysterious  method  of  Lagrange  resolvents  in  the  theory  of  Galois 
groups. 

Algebra  plays  a  singular  role  in  mathematics  by  having  been  developed  so 
extensively  at  such  an  early  date.  Indeed,  the  major  discoveries  of  algebra  even 
from  the  days  of  Hilbert  are  well  beyond  the  knowledge  of  most  nonalgebraists 
today.  Correspondingly  most  of  the  subject  matter  of  the  present  book  is  at 
least  100  years  old.  What  has  changed  over  the  intervening  years  concerning 
algebra  books  at  this  level  is  not  so  much  the  mathematics  as  the  point  of 
view  toward  the  subject  matter  and  the  relative  emphasis  on  and  generality  of 
various  topics.  For  example,  in  the  1920s  Emmy  Noether  introduced  vector 
spaces  and  linear  mappings  to  reinterpret  coordinate  spaces  and  matrices,  and 
she  defined  the  ingredients  of  what  was  then  called  “modern  algebra”— the 
axiomatically  defined  rings,  fields,  and  modules,  and  their  homomorphisms.  The 
introduction  of  categories  and  functors  in  the  1940s  shifted  the  emphasis  even 
more  toward  the  homomorphisms  and  away  from  the  objects  themselves.  The 
creation  of  homological  algebra  in  the  1950s  gave  a  unity  to  algebraic  topics 
cutting  across  many  fields  of  mathematics.  Category  theory  underwent  a  period 
of  great  expansion  in  the  1950s  and  1960s,  followed  by  a  contraction  and  a  return 
more  to  a  supporting  role.  The  emphasis  in  topics  shifted.  Linear  algebra  had 
earlier  been  viewed  as  a  separate  subject,  with  many  applications,  while  group 
theory  and  the  other  topics  had  been  viewed  as  having  few  applications.  Coding 
theory,  cryptography,  and  advances  in  physics  and  chemistry  have  changed  all 
that,  and  now  linear  algebra  and  group  theory  together  permeate  mathematics  and 
its  applications.  The  other  subjects  build  on  them,  and  they  too  have  extensive 
applications  in  science  and  engineering,  as  well  as  in  the  rest  of  mathematics. 

Basic  Algebra  presents  its  subject  matter  in  a  forward-looking  way  that  takes 
this  evolution  into  account.  It  is  suitable  as  a  text  in  a  two-semester  advanced 
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undergraduate  or  first-year  graduate  sequence  in  algebra.  Depending  on  the  grad¬ 
uate  school,  it  may  be  appropriate  to  include  also  some  material  from  Advanced 
Algebra.  Briefly  the  topics  in  Basic  Algebra  are  linear  algebra  and  group  theory, 
rings,  fields,  and  modules.  A  full  list  of  the  topics  in  Advanced  Algebra  appears 
on  page  x;  of  these,  the  Wedderburn  theory  of  semisimple  algebras,  homological 
algebra,  and  foundational  material  for  algebraic  geometry  are  the  ones  that  most 
commonly  appear  in  syllabi  of  first-year  graduate  courses. 

A  chart  on  page  xix  tells  the  dependence  among  chapters  and  can  help  with 
preparing  a  syllabus.  Chapters  I- VII  treat  linear  algebra  and  group  theory  at 
various  levels,  except  that  three  sections  of  Chapter  IV  and  one  of  Chapter  V 
introduce  rings  and  fields,  polynomials,  categories  and  functors,  and  determinants 
over  commutative  rings  with  identity.  Chapter  VIII  concerns  rings,  with  emphasis 
on  unique  factorization;  Chapter  IX  concerns  field  extensions  and  Galois  theory, 
with  emphasis  on  applications  of  Galois  theory;  and  Chapter  X  concerns  modules 
and  constructions  with  modules. 

For  a  graduate-level  sequence  the  syllabus  is  likely  to  include  all  of  Chapters 
I-V  and  parts  of  Chapters  VIII  and  IX,  at  a  minimum.  Depending  on  the 
knowledge  of  the  students  ahead  of  time,  it  may  be  possible  to  skim  much  of 
the  first  three  chapters  and  some  of  the  beginning  of  the  fourth;  then  time  may 
allow  for  some  of  Chapters  VI  and  VII,  or  additional  material  from  Chapters  VIII 
and  IX,  or  some  of  the  topics  in  Advanced  Algebra.  For  many  of  the  topics  in 
Advanced  Algebra,  parts  of  Chapter  X  of  Basic  Algebra  are  prerequisite. 

For  an  advanced  undergraduate  sequence  the  first  semester  can  include  Chap¬ 
ters  I  through  III  except  Section  II. 9,  plus  the  first  six  sections  of  Chapter  IV  and 
as  much  as  reasonable  from  Chapter  V;  the  notion  of  category  does  not  appear 
in  this  material.  The  second  semester  will  involve  categories  very  gently;  the 
course  will  perhaps  treat  the  remainder  of  Chapter  IV,  the  first  five  or  six  sections 
of  Chapter  VIII,  and  at  least  Sections  1-3  and  5  of  Chapter  IX. 

More  detailed  information  about  how  the  book  can  be  used  with  courses  can 
be  deduced  by  using  the  chart  on  page  xix  in  conjunction  with  the  section  “Guide 
for  the  Reader”  on  pages  xxi-xxiv.  In  my  own  graduate  teaching,  I  have  built  one 
course  around  Chapters  I— III,  Sections  1-6  of  Chapter  IV,  all  of  Chapter  V,  and 
about  half  of  Chapter  VI.  A  second  course  dealt  with  the  remainder  of  Chapter 
IV,  a  little  of  Chapter  VII,  Sections  1-6  of  Chapter  VIII,  and  Sections  1-11  of 
Chapter  IX. 

The  problems  at  the  ends  of  chapters  are  intended  to  play  a  more  important 
role  than  is  normal  for  problems  in  a  mathematics  book.  Almost  all  problems 
are  solved  in  the  section  of  hints  at  the  end  of  the  book.  This  being  so,  some 
blocks  of  problems  form  additional  topics  that  could  have  been  included  in  the 
text  but  were  not;  these  blocks  may  either  be  regarded  as  optional  topics,  or  they 
may  be  treated  as  challenges  for  the  reader.  The  optional  topics  of  this  kind 
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usually  either  carry  out  further  development  of  the  theory  or  introduce  significant 
applications.  For  example  one  block  of  problems  at  the  end  of  Chapter  VII 
carries  the  theory  of  representations  of  finite  groups  a  little  further  by  developing 
the  Poisson  summation  formula  and  the  fast  Fourier  transform.  For  a  second 
example  blocks  of  problems  at  the  ends  of  Chapters  IV,  VII,  and  IX  introduce 
linear  error-correcting  codes  as  an  application  of  the  theory  in  those  chapters. 

Not  all  problems  are  of  this  kind,  of  course.  Some  of  the  problems  are 
really  pure  or  applied  theorems,  some  are  examples  showing  the  degree  to  which 
hypotheses  can  be  stretched,  and  a  few  are  just  exercises.  The  reader  gets  no 
indication  which  problems  are  of  which  type,  nor  of  which  ones  are  relatively 
easy.  Each  problem  can  be  solved  with  tools  developed  up  to  that  point  in  the 
book,  plus  any  additional  prerequisites  that  are  noted. 

Beyond  a  standard  one-variable  calculus  course,  the  most  important  prereq¬ 
uisite  for  using  Basic  Algebra  is  that  the  reader  already  know  what  a  proof  is, 
how  to  read  a  proof,  and  how  to  write  a  proof.  This  knowledge  typically  is 
obtained  from  honors  calculus  courses,  or  from  a  course  in  linear  algebra,  or 
from  a  first  junior-senior  course  in  real  variables.  In  addition,  it  is  assumed  that 
the  reader  is  comfortable  with  a  small  amount  of  linear  algebra,  including  matrix 
computations,  row  reduction  of  matrices,  solutions  of  systems  of  linear  equations, 
and  the  associated  geometry.  Some  prior  exposure  to  groups  is  helpful  but  not 
really  necessary. 

The  theorems,  propositions,  lemmas,  and  corollaries  within  each  chapter  are 
indexed  by  a  single  number  stream.  Figures  have  their  own  number  stream,  and 
one  can  find  the  page  reference  for  each  figure  from  the  table  on  pages  xvii-xviii. 
Labels  on  displayed  lines  occur  only  within  proofs  and  examples,  and  they  are 
local  to  the  particular  proof  or  example  in  progress.  Some  readers  like  to  skim 
or  skip  proofs  on  first  reading;  to  facilitate  this  procedure,  each  occurrence  of  the 
word  “Proof”  or  "PROOF”  is  matched  by  an  occurrence  at  the  right  margin  of  the 
symbol  □  to  mark  the  end  of  that  proof. 

I  am  grateful  to  Ann  Kostant  and  Steven  Krantz  for  encouraging  this  project 
and  for  making  many  suggestions  about  pursuing  it.  I  am  especially  indebted  to 
an  anonymous  referee,  who  made  detailed  comments  about  many  aspects  of  a 
preliminary  version  of  the  book,  and  to  David  Kramer,  who  did  the  copyediting. 
The  typesetting  was  by  A^S-TgX,  and  the  figures  were  drawn  with  Mathematica. 

I  invite  corrections  and  other  comments  from  readers.  I  plan  to  maintain  a  list 
of  known  corrections  on  my  own  Web  page. 

A.  W.  Knapp 
August  2006 
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This  section  is  intended  to  help  the  reader  find  out  what  parts  of  each  chapter  are 
most  important  and  how  the  chapters  are  interrelated.  Further  information  of  this 
kind  is  contained  in  the  abstracts  that  begin  each  of  the  chapters. 

The  book  pays  attention  to  at  least  three  recurring  themes  in  algebra,  allowing 
a  person  to  see  how  these  themes  arise  in  increasingly  sophisticated  ways.  These 
are  the  analogy  between  integers  and  polynomials  in  one  indeterminate  over  a 
field,  the  interplay  between  linear  algebra  and  group  theory,  and  the  relationship 
between  number  theory  and  geometry.  Keeping  track  of  how  these  themes  evolve 
will  help  the  reader  understand  the  mathematics  better  and  anticipate  where  it  is 
headed. 

In  Chapter  I  the  analogy  between  integers  and  polynomials  in  one  indeterminate 
over  the  rationals,  reals,  or  complex  numbers  appears  already  in  the  first  three 
sections.  The  main  results  of  these  sections  are  theorems  about  unique  factoriza¬ 
tion  in  each  of  the  two  settings.  The  relevant  parts  of  the  underlying  structures  for 
the  two  settings  are  the  same,  and  unique  factorization  can  therefore  be  proved  in 
both  settings  by  the  same  argument.  Many  readers  will  already  know  this  unique 
factorization,  but  it  is  worth  examining  the  parallel  structure  and  proof  at  least 
quickly  before  turning  to  the  chapters  that  follow. 

Before  proceeding  very  far  into  the  book,  it  is  worth  looking  also  at  the  appendix 
to  see  whether  all  its  topics  are  familiar.  Readers  will  find  Section  A1  useful 
at  least  for  its  summary  of  set-theoretic  notation  and  for  its  emphasis  on  the 
distinction  between  range  and  image  for  a  function.  This  distinction  is  usually 
unimportant  in  analysis  but  becomes  increasingly  important  as  one  studies  more 
advanced  topics  in  algebra.  Readers  who  have  not  specifically  learned  about 
equivalence  relations  and  partial  orderings  can  learn  about  them  from  Sections 
A2  and  A5.  Sections  A3  and  A4  concern  the  real  and  complex  numbers;  the 
emphasis  is  on  notation  and  the  Intermediate  Value  Theorem,  which  plays  a  role 
in  proving  the  Fundamental  Theorem  of  Algebra.  Zorn’s  Lemma  and  cardinality 
in  Sections  A5  and  A6  are  usually  unnecessary  in  an  undergraduate  course.  They 
arise  most  importantly  in  Sections  II. 9  and  IX. 4,  which  are  normally  omitted  in 
an  undergraduate  course,  and  in  Proposition  8.8,  which  is  invoked  only  in  the  last 
few  sections  of  Chapter  VIII. 

The  remainder  of  this  section  is  an  overview  of  individual  chapters  and  pairs 
of  chapters. 
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Chapter  I  is  in  three  parts.  The  first  part,  as  mentioned  above,  establishes  unique 
factorization  for  the  integers  and  for  polynomials  in  one  indeterminate  over  the 
rationals,  reals,  or  complex  numbers.  The  second  part  defines  permutations  and 
shows  that  they  have  signs  such  that  the  sign  of  any  composition  is  the  product  of 
the  signs;  this  result  is  essential  for  defining  general  determinants  in  Section  II. 7. 
The  third  part  will  likely  be  a  review  for  all  readers.  It  establishes  notation  for  row 
reduction  of  matrices  and  for  operations  on  matrices,  and  it  uses  row  reduction 
to  show  that  a  one-sided  inverse  for  a  square  matrix  is  a  two-sided  inverse. 

Chapters  II— III  treat  the  fundamentals  of  linear  algebra.  Whereas  the  matrix 
computations  in  Chapter  I  were  concrete.  Chapters  II— III  are  relatively  abstract. 
Much  of  this  material  is  likely  to  be  a  review  for  graduate  students.  The  geometric 
interpretation  of  vectors  spaces,  subspaces,  and  linear  mappings  is  not  included  in 
the  chapter,  being  taken  as  known  previously.  The  fundamental  idea  that  a  newly 
constructed  object  might  be  characterized  by  a  “universal  mapping  property” 
appears  for  the  first  time  in  Chapter  II,  and  it  appears  more  and  more  frequently 
throughout  the  book.  One  aspect  of  this  idea  is  that  it  is  sometimes  not  so 
important  what  certain  constructed  objects  are,  but  what  they  do.  A  related  idea 
being  emphasized  is  that  the  mappings  associated  with  a  newly  constructed  object 
are  likely  to  be  as  important  as  the  object,  if  not  more  so;  at  the  least,  one  needs  to 
stop  and  find  what  those  mappings  are.  Section  II. 9  uses  Zorn’s  Lemma  and  can 
be  deferred  until  Chapter  IX  if  one  wants.  Chapter  III  discusses  special  features 
of  real  and  complex  vector  spaces  endowed  with  inner  products.  The  main  result 
is  the  Spectral  Theorem  in  Section  3.  Many  of  the  problems  at  the  end  of  the 
chapter  make  contact  with  real  analysis.  The  subject  of  linear  algebra  continues 
in  Chapter  V. 

Chapter  IV  is  the  primary  chapter  on  group  theory  and  may  be  viewed  as  in 
three  parts.  Sections  1-6  form  the  first  part,  which  is  essential  for  all  later  chapters 
in  the  book.  Sections  1-3  introduce  groups  and  some  associated  constructions, 
along  with  a  number  of  examples.  Many  of  the  examples  will  be  seen  to  be 
related  to  specific  or  general  vector  spaces,  and  thus  the  theme  of  the  interplay 
between  group  theory  and  linear  algebra  is  appearing  concretely  for  the  first  time. 
In  practice,  many  examples  of  groups  arise  in  the  context  of  group  actions,  and 
abstract  group  actions  are  defined  in  Section  6.  Of  particular  interest  are  group 
representations,  which  are  group  actions  on  a  vector  space  by  linear  mappings. 
Sections  4—5  are  a  digression  to  define  rings,  fields,  and  ring  homomorphisms, 
and  to  extend  the  theories  concerning  polynomials  and  vector  spaces  as  presented 
in  Chapters  I— II.  The  immediate  purpose  of  the  digression  is  to  make  prime  fields, 
their  associated  multiplicative  groups,  and  the  notion  of  characteristic  available 
for  the  remainder  of  the  chapter.  The  definition  of  vector  space  is  extended 
to  allow  scalars  from  any  field.  The  definition  of  polynomial  is  extended  to 
allow  coefficients  from  any  commutative  ring  with  identity,  rather  than  just  the 
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rationals  or  reals  or  complex  numbers,  and  to  allow  more  than  one  indeterminate. 
Universal  mapping  properties  for  polynomial  rings  are  proved.  Sections  7-10 
form  the  second  part  of  the  chapter  and  are  a  continuation  of  group  theory.  The 
main  result  is  the  Fundamental  Theorem  of  Finitely  Generated  Abelian  Groups, 
which  is  in  Section  9.  Section  1 1  forms  the  third  part  of  the  chapter.  This  section 
is  a  gentle  introduction  to  categories  and  functors,  which  are  useful  for  working 
with  parallel  structures  in  different  settings  within  algebra.  As  S.  Mac  Lane  says 
in  his  book,  “Category  theory  asks  of  every  type  of  Mathematical  object:  ‘What 
are  the  morphisms?’;  it  suggests  that  these  morphisms  should  be  described  at  the 
same  time  as  the  objects. . . .  This  emphasis  on  (homo)morphisms  is  largely  due  to 
Emmy  Noether,  who  emphasized  the  use  of  homo  morphisms  of  groups  and  rings.” 
The  simplest  parallel  structure  reflected  in  categories  is  that  of  an  isomorphism. 
The  section  also  discusses  general  notions  of  product  and  coproduct  functors. 
Examples  of  products  are  direct  products  in  linear  algebra  and  in  group  theory. 
Examples  of  coproducts  are  direct  sums  in  linear  algebra  and  in  abelian  group 
theory,  as  well  as  disjoint  unions  in  set  theory.  The  theory  in  this  section  helps  in 
unifying  the  mathematics  that  is  to  come  in  Chapters  VI- VIII  and  X.  The  subject 
of  group  theory  in  continued  in  Chapter  VII,  which  assumes  knowledge  of  the 
material  on  category  theory. 

Chapters  V  and  VI  continue  the  development  of  linear  algebra.  Chapter  VI  uses 
categories,  but  Chapter  V  does  not.  Most  of  Chapter  V  concerns  the  analysis  of  a 
linear  transformation  carrying  a  finite-dimensional  vector  space  over  a  field  into 
itself.  The  questions  are  to  find  invariants  of  such  transformations  and  to  classify 
the  transformations  up  to  similarity.  Section  2  at  the  start  extends  the  theory  of 
determinants  so  that  the  matrices  are  allowed  to  have  entries  in  a  commutative 
ring  with  identity;  this  extension  is  necessary  in  order  to  be  able  to  work  easily 
with  characteristic  polynomials.  The  extension  of  this  theory  is  carried  out  by 
an  important  principle  known  as  the  “permanence  of  identities.”  Chapter  VI 
largely  concerns  bilinear  forms  and  tensor  products,  again  in  the  context  that  the 
coefficients  are  from  a  field.  This  material  is  necessary  in  many  applications  to 
geometry  and  physics,  but  it  is  not  needed  in  Chapters  VII-IX.  Many  objects  in 
the  chapter  are  constructed  in  such  a  way  that  they  are  uniquely  determined  by 
a  universal  mapping  property.  Problems  18-22  at  the  end  of  the  chapter  discuss 
universal  mapping  properties  in  the  general  context  of  category  theory,  and  they 
show  that  a  uniqueness  theorem  is  automatic  in  all  cases. 

Chapter  VII  continues  the  development  of  group  theory,  making  use  of  category 
theory.  It  is  in  two  parts.  Sections  1-3  concern  free  groups  and  the  topic  of 
generators  and  relations;  they  are  essential  for  abstract  descriptions  of  groups 
and  for  work  in  topology  involving  fundamental  groups.  Section  3  constructs  a 
notion  of  free  product  and  shows  that  it  is  the  coproduct  functor  for  the  category 
of  groups.  Sections  4-6  continue  the  theme  of  the  interplay  of  group  theory  and 
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linear  algebra.  Section  4  analyzes  group  representations  of  a  finite  group  when 
the  underlying  field  is  the  complex  numbers,  and  Section  5  applies  this  theory 
to  obtain  a  conclusion  about  the  structure  of  finite  groups.  Section  6  studies 
extensions  of  groups  and  uses  them  to  motivate  the  subject  of  cohomology  of 
groups. 

Chapter  VIII  introduces  modules,  giving  many  examples  in  Section  1,  and 
then  goes  on  to  discuss  questions  of  unique  factorization  in  integral  domains. 
Section  6  obtains  a  generalization  for  principal  ideal  domains  of  the  Fundamental 
Theorem  of  Finitely  Generated  Abelian  Groups,  once  again  illustrating  the  first 
theme— similarities  between  the  integers  and  certain  polynomial  rings.  Section  7 
introduces  the  third  theme,  the  relationship  between  number  theory  and  geometry, 
as  a  more  sophisticated  version  of  the  first  theme.  The  section  compares  a  certain 
polynomial  ring  in  two  variables  with  a  certain  ring  of  algebraic  integers  that 
extends  the  ordinary  integers.  Unique  factorization  of  elements  fails  for  both,  but 
the  geometric  setting  has  a  more  geometrically  meaningful  factorization  in  terms 
of  ideals  that  is  evidently  unique.  This  kind  of  unique  factorization  turns  out  to 
work  for  the  ring  of  algebraic  integers  as  well.  Sections  8-11  expand  the  examples 
in  Section  7  into  a  theory  of  unique  factorization  of  ideals  in  any  integrally  closed 
Noetherian  domain  whose  nonzero  prime  ideals  are  all  maximal. 

Chapter  IX  analyzes  algebraic  extensions  of  fields.  The  first  13  sections 
make  use  only  of  Sections  1-6  in  Chapter  VIII.  Sections  1-5  of  Chapter  IX 
give  the  foundational  theory,  which  is  sufficient  to  exhibit  all  the  finite  fields  and 
to  prove  that  certain  classically  proposed  constructions  in  Euclidean  geometry 
are  impossible.  Sections  6-8  introduce  Galois  theory,  but  Theorem  9.28  and 
its  three  corollaries  may  be  skipped  if  Sections  14-17  are  to  be  omitted.  Sec¬ 
tions  9-1 1  give  a  first  round  of  applications  of  Galois  theory:  Gauss’s  theorem 
about  which  regular  n-gons  are  in  principle  constructible  with  straightedge  and 
compass,  the  Fundamental  Theorem  of  Algebra,  and  the  Abel-Galois  theorem 
that  solvability  of  a  polynomial  equation  with  rational  coefficients  in  terms  of 
radicals  implies  solvability  of  the  Galois  group.  Sections  12-13  give  a  second 
round  of  applications:  Gauss’s  method  in  principle  for  actually  constructing  the 
constructible  regular  n-gons  and  a  converse  to  the  Abel-Galois  theorem.  Sections 
14-17  make  use  ofSections  7-11  ofChapterVIII,  proving  that7T  is  transcendental 
and  obtaining  two  methods  for  computing  Galois  groups. 

Chapter  X  is  a  relatively  short  chapter  developing  further  tools  for  dealing 
with  modules  over  a  ring  with  identity.  The  main  construction  is  that  of  the 
tensor  product  over  a  ring  of  a  unital  right  module  and  a  unital  left  module,  the 
result  being  an  abelian  group.  The  chapter  makes  use  of  material  from  Chapters 
VI  and  VIII,  but  not  from  Chapter  IX. 
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Preliminaries  about  the  Integers,  Polynomials, 
and  Matrices 


Abstract.  This  chapter  is  mostly  a  review,  discussing  unique  factorization  of  positive  integers, 
unique  factorization  of  polynomials  whose  coefficients  are  rational  or  real  or  complex,  signs  of 
permutations,  and  matrix  algebra. 

Sections  1-2  concern  unique  factorization  of  positive  integers.  Section  1  proves  the  division 
and  Euclidean  algorithms,  used  to  compute  greatest  common  divisors.  Section  2  establishes  unique 
factorization  as  a  consequence  and  gives  several  number-theoretic  consequences,  including  the 
Chinese  Remainder  Theorem  and  the  evaluation  of  the  Euler  <p  function. 

Section  3  develops  unique  factorization  of  rational  and  real  and  complex  polynomials  in  one  inde¬ 
terminate  completely  analogously,  and  it  derives  the  complete  factorization  of  complex  polynomials 
from  the  Fundamental  Theorem  of  Algebra.  The  proof  of  the  fundamental  theorem  is  postponed  to 
Chapter  IX. 

Section  4  discusses  permutations  of  a  finite  set,  establishing  the  decomposition  of  each  permu¬ 
tation  as  a  disjoint  product  of  cycles.  The  sign  of  a  permutation  is  introduced,  and  it  is  proved  that 
the  sign  of  a  product  is  the  product  of  the  signs. 

Sections  5-6  concern  matrix  algebra.  Section  5  reviews  row  reduction  and  its  role  in  the  solution 
of  simultaneous  linear  equations.  Section  6  defines  the  arithmetic  operations  of  addition,  scalar 
multiplication,  and  multiplication  of  matrices.  The  process  of  matrix  inversion  is  related  to  the 
method  of  row  reduction,  and  it  is  shown  that  a  square  matrix  with  a  one-sided  inverse  automatically 
has  a  two-sided  inverse  that  is  computable  via  row  reduction. 


1.  Division  and  Euclidean  Algorithms 

The  first  three  sections  give  a  careful  proof  of  unique  factorization  for  integers 
and  for  polynomials  with  rational  or  real  or  complex  coefficients,  and  they  give 
an  indication  of  some  first  consequences  of  this  factorization.  For  the  moment 
let  us  restrict  attention  to  the  set  Z  of  integers.  We  take  addition,  subtraction, 
and  multiplication  within  Z  as  established,  as  well  as  the  properties  of  the  usual 
ordering  in  Z. 

A  factor  of  an  integer  n  is  a  nonzero  integer  k  such  that  n  =  kl  for  some 
integer  l.  In  this  case  we  say  also  that  k  divides  n,  that  k  is  a  divisor  of  n,  and 
that  n  is  a  multiple  of  k.  We  write  k  \  n  for  this  relationship.  If  n  is  nonzero,  any 
product  formula  n  =  kl\  •  •  •  lr  is  a  factorization  of  n.  A  unit  in  Z  is  a  divisor 
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of  1,  hence  is  either  +1  or  —1.  The  factorization  n  =  kl  of  n  ^  0  is  called 
nontrivial  if  neither  k  nor  I  is  a  unit.  An  integer  p  >  I  is  said  to  be  prime  if  it 
has  no  nontrivial  factorization  p  =  kl. 

The  statement  of  unique  factorization  for  positive  integers,  which  will  be  given 
precisely  in  Section  2,  says  roughly  that  each  positive  integer  is  the  product  of 
primes  and  that  this  decomposition  is  unique  apart  from  the  order  of  the  factors.1 
Existence  will  follow  by  an  easy  induction.  The  difficulty  is  in  the  uniqueness.  We 
shall  prove  uniqueness  by  a  sequence  of  steps  based  on  the  “Euclidean  algorithm,” 
which  we  discuss  in  a  moment.  In  turn,  the  Euclidean  algorithm  relies  on  the 
following. 

Proposition  1.1  (division  algorithm).  If  a  and  b  are  integers  with  b  ^  0,  then 
there  exist  unique  integers  q  and  r  such  that  a  =  bq  +  r  and  0  <  r  <  \b\. 

PROOF.  Possibly  replacing  q  by  —q,  we  may  assume  that  b  >  0.  The  integers 
n  with  bn  <  a  are  bounded  above  by  \a\,  and  there  exists  such  an  n,  namely 
n  =  —\a\.  Therefore  there  is  a  largest  such  integer,  say  n  =  q.  Set  r  = 
a  —  bq.  Then  0  <  r  and  a  =  bq  +  r .  If  r  >  b ,  then  r  —  b  >0  says  that 
a  =  b(q  +  1)  +  (r  —  b)  >  b(q  +  1 ).  The  inequality  q  +  1  >  q  contradicts  the 
maximality  of  q,  and  we  conclude  that  r  <  b.  This  proves  existence. 

For  uniqueness  when  b  >  0,  suppose  a  =  bq\  +  r\  =  bq2  +  r2.  Subtracting, 
we  obtain  b{q\  —  qi)  =  r2  —  n  with  |r2  —  n[  <  b,  and  this  is  a  contradiction 
unless  r2  —  r\  =0.  □ 

Let  a  and  b  be  integers  not  both  0.  The  greatest  common  divisor  of  a  and 
b  is  the  largest  integer  d  >  0  such  that  d  \  a  and  d  \  b.  Let  us  see  existence. 
The  integer  1  divides  a  and  b.  If  b.  for  example,  is  nonzero,  then  any  such  d 
has  \d\  <  \b\.  and  hence  the  greatest  common  divisor  indeed  exists.  We  write 
d  =  GCDffi,  b). 

Let  us  suppose  that  b  /  0.  The  Euclidean  algorithm  consists  of  iterated  ap¬ 
plication  of  the  division  algorithm  (Proposition  1 . 1 )  to  a  and  b  until  the  remainder 
term  r  disappears: 

a  =  bq  i  +  rj, 
b  =  riq2+r2, 
f  l  =  r2<?3  +r  3, 

I'll— 2  ~  L/  1  q n  T  , 

r  n—  1  =  nk[n+ 1  • 

1  It  is  to  be  understood  that  the  prime  factorization  of  1  is  as  the  empty  product. 


0  <  r\  <  b, 
o  <  r2  <  ri, 

0  <  r3  <  r2, 

0  <  r„  <  r„_ i  (with  r n  /  0,  say), 
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The  process  must  stop  with  some  remainder  term  rn+\  equal  to  0  in  this  way  since 
b  >  r\  >  r2  >  ■  ■  •  >  0.  The  last  nonzero  remainder  term,  namely  rn  above,  will 
be  of  interest  to  us. 

Example.  For  a  =  13  and  b  =  5,  the  steps  read 

13  =  5-2  +  3, 

5  =  3  •  1  +  2, 

3  =  2-1+  [T] , 

2=1-2. 

The  last  nonzero  remainder  term  is  written  with  a  box  around  it. 


Proposition  1.2.  Let  a  and  h  be  integers  with  b  ^  0,  and  let  d  =  GCD(u,  b ). 
Then 

(a)  the  number  rn  in  the  Euclidean  algorithm  is  exactly  d, 

(b)  any  divisor  d'  of  both  a  and  b  necessarily  divides  d, 

(c)  there  exist  integers  x  and  y  such  that  ax  +  by  =  d. 

Remark.  Proposition  1.2c  is  sometimes  called  Bezout’s  identity. 

Example,  continued.  We  rewrite  the  steps  of  the  Euclidean  algorithm,  as 
applied  in  the  above  example  with  a  =  13  and  b  =  5,  so  as  to  yield  successive 
substitutions: 


13  =  5-2  +  3, 


5  =  3-1  + 

2, 

3  =  2-  1  + 

1 

3  =  13-5-2, 

2  =  5  —  3-1=5  —  (13  —  5  •  2)  •  1  =  5  •  3  —  13  •  1, 
1  =  3  -  2  •  1  =  (13  -  5  •  2)  -  (5  •  3  -  13  •  1)  •  1 
=  13-2-5-5. 


Thus  we  see  that  1  =  1 3x  +  5y  with  x  =  2  and  y  =  —5.  This  shows  for  the 
example  that  the  number  rn  works  in  place  of  d  in  Proposition  1.2c,  and  the  rest 
of  the  proof  of  the  proposition  for  this  example  is  quite  easy.  Let  us  now  adjust 
this  computation  to  obtain  a  complete  proof  of  the  proposition  in  general. 

Proof  of  Proposition  1 .2.  Put  r0  =  b  and  r_i  =  a,  so  that 

Ct-2  =  rk- 1  dk  +  rk  for  1  <  k  <  n.  (*) 


The  argument  proceeds  in  three  steps. 
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Step  1.  We  show  that  r„  is  a  divisor  of  both  a  and  b.  In  fact,  from  r„_  \  = 
rnqn+i,  we  have  r„  \  r„_ Let  k  <  n,  and  assume  inductively  that  r„  divides 
r*_i ,  . . . ,  r„_ i,  rn.  Then  (*)  shows  that  rn  divides  rk_ 2.  Induction  allows  us  to 
conclude  that  rn  divides  r_  1,  ro, . . . ,  r„ _  1 .  In  particular,  r„  divides  a  and  b. 

Step  2.  We  prove  that  ax  +  by  =  rn  for  suitable  integers  x  and  y.  In  fact, 
we  show  by  induction  on  k  for  k  <  n  that  there  exist  integers  x  and  y  with 
ax  +  by  =  />.  For  k  =  —  1  and  k  =  0,  this  conclusion  is  trivial.  If  A:  >  1  is  given 
and  if  the  result  is  known  for  k  —  2  and  k  —  1 ,  then  we  have 


ax  2  +  by  2  =  t'k-2, 
ax  1  +  by  1  =  rk_i 


(**) 


for  suitable  integers  X2,  yi,  X\,  Vi-  We  multiply  the  second  of  the  equalities  of 
(**)  by  qk,  subtract,  and  substitute  into  (*)•  The  result  is 


n  =  n-2  -  n-iqk  =  a(x2  -  qkx\ )  +  b{y2  -  qky  1), 


and  the  induction  is  complete.  Thus  ax  +  by  =  rn  for  suitable  x  and  y. 

Step  3.  Finally  we  deduce  (a),  (b),  and  (c).  Step  1  shows  that  r„  divides  a  and 
b.  If  d'  >  0  divides  both  a  and  b,  the  result  of  Step  2  shows  that  d'  \  rn.  Thus 
d'  <  rn,  and  rn  is  the  greatest  common  divisor.  This  is  the  conclusion  of  (a);  (b) 
follows  from  (a)  since  d'  \  rn ,  and  (c)  follows  from  (a)  and  Step  2.  □ 


Corollary  1.3.  Within  Z,  if  c  is  a  nonzero  integer  that  divides  a  product  inn 
and  if  GCD(c,  m )  =  1,  then  c  divides  n. 

Proof.  Proposition  1.2c  produces  integers  x  and  y  with  cx  +  my  =  1. 
Multiplying  by  n,  we  obtain  cnx  +  mny  =  n.  Since  c  divides  mn  and  divides 
itself,  c  divides  both  terms  on  the  left  side.  Therefore  it  divides  the  right  side, 
which  is  n.  □ 


Corollary  1.4.  Within  Z,  if  a  and  b  are  nonzero  integers  with  GCD(c/,  b)  =  1 
and  if  both  of  them  divide  the  integer  m,  then  ab  divides  m. 

Proof.  Proposition  1.2c  produces  integers  x  and  y  with  ax  +  by  =  1. 
Multiplying  by  m,  we  obtain  a  nix  +  bmy  =  m,  which  we  rewrite  in  integers 
as  ab(m/b)x  +  ab(m/a)y  =  m.  Since  ab  divides  each  term  on  the  left  side,  it 
divides  the  right  side,  which  is  m.  □ 


2.  Unique  Factorization  of  Integers 

We  come  now  to  the  theorem  asserting  unique  factorization  for  the  integers.  The 
precise  statement  is  as  follows. 
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Theorem  1.5  (Fundamental  Theorem  of  Arithmetic).  Each  positive  integer 
n  can  be  written  as  a  product  of  primes,  n  =  p \  pi  ■  ■  ■  pr,  with  the  integer  1 
being  written  as  an  empty  product.  This  factorization  is  unique  in  the  following 
sense:  if  n  =  q  \  c/2  ■  ■  •  qs  is  another  such  factorization,  then  r  =  s  and,  after  some 
reordering  of  the  factors,  qj  =  pj  for  1  <  j  <  r. 

The  main  step  is  the  following  lemma,  which  relies  on  Corollary  1.3. 

Lemma  1.6.  Within  Z,  if  p  is  a  prime  and  p  divides  a  product  ab ,  then  p 
divides  a  or  p  divides  b. 

Remark.  Lemma  1.6  is  sometimes  known  as  Euclid’s  Lemma. 

PROOF.  Suppose  that  p  does  not  divide  a.  Since  p  is  prime,  GCD(a,  p)  =  I . 
Taking  m  =  a,n  =  b,  and  c  =  p  in  Corollary  1.3,  we  see  that  p  divides  b.  □ 

Proof  of  existence  in  Theorem  1.5.  We  induct  on  n,  the  case  n  =  1  being 
handled  by  an  empty  product  expansion.  If  the  result  holds  for  k  =  1  through 
k  =  n  —  1,  there  are  two  cases:  n  is  prime  and  n  is  not  prime.  If  n  is  prime,  then 
n  =  n  is  the  desired  factorization.  Otherwise  we  can  write  n  =  ab  nontrivially 
with  a  >  1  and  b  >  1 .  Then  a  <  n  —  1  and  b  <  n  —  1 ,  so  that  a  and  b  have 
factorizations  into  primes  by  the  inductive  hypothesis.  Putting  them  together 
yields  a  factorization  into  primes  for  n  =  ab.  □ 

Proof  of  uniqueness  in  Theorem  1.5.  Suppose  that  n  =  p\p2  ■  ■  ■  pr  = 
<7 1  <72  •  -  •  qs  with  all  factors  prime  and  with  r  <  s.  We  prove  the  uniqueness  by 
induction  on  r,  the  case  r  =  0  being  trivial  and  the  case  r  =  1  following  from 
the  definition  of  “prime.”  Inductively  from  Lemma  1.6  we  have  pr  \  qi  for  some 
k.  Since  q^  is  prime,  p,  =  q^.  Thus  we  can  cancel  and  obtain  p \  p2  ■  ■  ■  _  |  = 

the  hat  indicating  an  omitted  factor.  By  induction  the  factors 
on  the  two  sides  here  are  the  same  except  for  order.  Thus  the  same  conclusion 
is  valid  when  comparing  the  two  sides  of  the  equality  p\p2  •  •  •  pr  =  qiq2  •  •  •  qs- 
The  induction  is  complete,  and  the  desired  uniqueness  follows.  □ 

In  the  product  expansion  of  Theorem  1.5,  it  is  customary  to  group  factors  that 
are  equal,  thus  writing  the  positive  integer  n  as  n  =  p\'  ■  ■  ■  pkp  with  the  primes 

Pj  distinct  and  with  the  integers  Ay  all  >  0.  This  kind  of  decomposition  is  unique 

k- 

up  to  order  if  all  factors  p-  with  Ay  =  0  are  dropped,  and  we  call  it  a  prime 
factorization  of  n. 

Corollary  1.7.  If  n  =  p\'  ■  •  •  pkr  is  a  prime  factorization  of  a  positive  integer 
n ,  then  the  positive  divisors  d  of  n  are  exactly  all  products  d  =  p['  ■  ■  ■  p\:  with 
0  <  lj  <  Ay  for  all  j. 
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Remark.  A  general  divisor  of  n  within  Z  is  the  product  of  a  unit  ±1  and  a 
positive  divisor. 

PROOF.  Certainly  any  such  product  divides  n.  Conversely  if  d  divides  n,  write 
n  =  dx  for  some  positive  integer  x.  Apply  Theorem  1.5  to  d  and  to  x,  form  the 
resulting  prime  factorizations,  and  multiply  them  together.  Then  we  see  from  the 
uniqueness  for  the  prime  factorization  of  n  that  the  only  primes  that  can  occur  in 
the  expansions  of  d  and  x  are  p\, ... .  pr  and  that  the  sum  of  the  exponents  of  pj 
in  the  expansions  of  d  and  x  is  kj .  The  result  follows.  □ 

If  we  want  to  compare  prime  factorizations  for  two  positive  integers,  we  can 
insert  0th  powers  of  primes  as  necessary  and  thereby  assume  that  the  same  primes 
appear  in  both  expansions.  Using  this  device,  we  obtain  a  formula  for  greatest 
common  divisors. 

Corollary  1.8.  If  two  positive  integers  a  and  b  have  expansions  as  products 
of  powers  of  r  distinct  primes  given  by  a  =  p\l  ■  ■  ■  pk:  and  b  =  p['  ■  ■  ■  p[' ,  then 

GCD(a,  b)  =  pfn(k'-h) . . .  pf" (*-/'). 

PROOF.  Let  d'  be  the  right  side  of  the  displayed  equation.  It  is  plain  that  d' 
is  positive  and  that  d'  divides  a  and  b.  On  the  other  hand,  two  applications  of 
Corollary  1.7  show  that  the  greatest  common  divisor  of  a  and  b  is  a  number  d 
of  the  form  //" 1  •  •  •  p™r  with  the  property  that  m ;  <  kj  and  in ;  <  /;  for  all  j. 
Therefore  m;  <  min  (kj,  lj)  for  all  j ,  and  d  <  d' .  Since  any  positive  divisor  of 
both  a  and  b  is  <  d,  we  have  d'  <  d.  Thus  d'  =  d.  □ 

In  special  cases  Corollary  1.8  provides  a  useful  way  to  compute  GCD(u,  /?), 
but  the  Euclidean  algorithm  is  usually  a  more  efficient  procedure.  Nevertheless, 
Corollary  1.8  remains  a  handy  tool  for  theoretical  purposes.  Here  is  an  example: 
Two  nonzero  integers  a  and  b  are  said  to  be  relatively  prime  if  GCD(«,  b)  =  1. 
It  is  immediate  from  Corollary  1.8  that  two  nonzero  integers  a  and  b  are  relatively 
prime  if  and  only  if  there  is  no  prime  p  that  divides  both  a  and  b. 

Corollary  1.9  (Chinese  Remainder  Theorem).  Let  a  and  b  be  positive  rela¬ 
tively  prime  integers.  To  each  pair  (r,  .s  )  of  integers  with  0  <  r  <  a  and  0  <  s  <  b 
corresponds  a  unique  integer  n  such  that  0  <  n  <  ab,  a  divides  n  —  r,  and  b 
divides  n  —  s.  Moreover,  every  integer  n  with  0  <  n  <  ab  arises  from  some  such 
pair  (r,  s). 

Remark.  In  notation  for  congruences  that  we  introduce  formally  in  Chapter  IV, 
the  result  says  that  if  GCDfa,  b)  =  1,  then  the  congruences  n  =  r  mod  a  and 
n  =  s  mod  b  have  one  and  only  one  simultaneous  solution  n  with  0  <  n  <  ab. 
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PROOF.  Let  us  see  that  n  exists  as  asserted.  Since  a  and  b  are  relatively 
prime.  Proposition  1.2c  produces  integers  x'  and  y'  such  that  ax'  —  by'  =  1. 
Multiplying  by  5  —  r,  we  obtain  ax  —  by  =  s  —  r  for  suitable  integers  x  and  y. 
Put  t  =  ax  +  r  =  by  +  s,  and  write  by  the  division  algorithm  (Proposition  1.1) 
t  =  abq  +  n  for  some  integer  q  and  for  some  integer  n  with  0  <  n  <  ab.  Then 
n—r  =  t  —  abq  —r  =  ax  —  abq  is  divisible  by  a,  and  similarly  n  —  s  is  divisible 
by  b. 

Suppose  that  n  and  n'  both  have  the  asserted  properties.  Then  a  divides 
n  —  n'  =  ( n  —  r)  —  in'  —  r),  and  b  divides  n  —  n'  =  (n  —  s)  —  ( n '  —  s).  Since 
a  and  b  are  relatively  prime.  Corollary  1.4  shows  that  ab  divides  n  —  n! .  But 
| n  —  n'\  <  ab,  and  the  only  integer  N  with  /V  <  ab  that  is  divisible  by  ab  is 
N  =  0.  Thus  n  —  n'  =  0  and  n  =  n .  This  proves  uniqueness. 

Finally  the  argument  just  given  defines  a  one-one  function  from  a  set  of  ab 
pairs  (r,  s)  to  a  set  of  ab  elements  n.  Its  image  must  therefore  be  all  such  integers 
n .  This  proves  the  corollary.  □ 

If  n  is  a  positive  integer,  we  define  (pin)  to  be  the  number  of  integers  k  with 
0  <  k  <  n  such  that  k  and  n  are  relatively  prime.  The  function  <p  is  called  the 

Euler  <p  function. 

Corollary  1.10.  Let  N  >  1  be  an  integer,  and  let  N  =  p\l  ■  ■  ■  p^:  be  a  prime 
factorization  of  N.  Then 


(p{N)  =  Y[p)J  l(Pj-l). 

j=  i 

Remark.  The  conclusion  is  valid  also  for  N  =  1  if  we  interpret  the  right  side 
of  the  formula  to  be  the  empty  product. 

Proof.  For  positive  integers  a  and  b,  let  us  check  that 

tp(ab)  =  < p{a)(p{b)  if  GCD(o,  b)  =  1.  (*) 

In  view  of  Corollary  1.9,  it  is  enough  to  prove  that  the  mapping  (r,  s)  n  given 
in  that  corollary  has  the  property  that  GCD(r,  a)  =  GCDfy,  b)  =  1  if  and  only  if 
GCD(n,  ab)  =  1. 

To  see  this  property,  suppose  that  n  satisfies  0  <  n  <  ah  and  GCDi/i,  ab)  >  1. 
Choose  a  prime  p  dividing  both  n  and  ab.  By  Lemma  1.6,  p  divides  a  or  p  divides 
b.  By  symmetry  we  may  assume  that  p  divides  a.  If(r,  s)  is  the  pair  corresponding 
to  n  under  Corollary  1.9,  then  the  corollary  says  that  a  divides  n  —  r.  Since  p 
divides  a,  p  divides  n  —  r.  Since  p  divides  n,  p  divides  r.  Thus  GCD(r,  a)  >  1. 

Conversely  suppose  that  (r,  s)  is  a  pair  with  0  <  r  <  a  and  0  <  s  <  b  such 
that  GCD(r,  a)  =  GCD(s,  b)  =  1  is  false.  Without  loss  of  generality,  we  may 
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assume  that  GCD(r,  a)  >  I .  Choose  a  prime  p  dividing  both  r  and  a.  If  n  is  the 
integer  with  0  <  n  <  ab  that  corresponds  to  (r,  s )  under  Corollary  1 .9,  then  the 
corollary  says  that  a  divides  n  —  r .  Since  p  divides  a,  p  divides  n  —  r.  Since  p 
divides  r,  p  divides  n.  Thus  GCD(n,  ab)  >  1.  This  completes  the  proof  of  (*). 

For  a  power  pk  of  a  prime  p  with  k  >  0,  the  integers  n  with  ()</;<  pk 
such  that  GCD(«,  pk)  >  1  are  the  multiples  of  p.  namely  0,  p,  2 p, . . . ,  pk  —  p. 
There  are  pk~i  of  them.  Thus  the  number  of  integers  n  with  0  <  n  <  pk  such 
that  GCD(/7,  pk)  =  1  is  pk  —  pk~l  =  pk~l(p  —  1).  In  other  words, 

( p(pk )  =  pk~l(p  —  1)  if  p  is  prime  and  k  >  1.  (**) 

To  prove  the  corollary,  we  induct  on  r,  the  case  r  =  1  being  handled  by  (**)•  If 
the  formula  of  the  corollary  is  valid  for  r  —  1 ,  then  (*)  allows  us  to  combine  that 
result  with  the  formula  for  <p(pkr )  given  in  (**)  to  obtain  the  formula  for  <p(N). 

□ 

We  conclude  this  section  by  extending  the  notion  of  greatest  common  divisor  to 
apply  to  more  than  two  integers.  If  a\ , . . .  ,at  are  integers  not  all  0,  their  greatest 
common  divisor  is  the  largest  integer  d  >  0  that  divides  all  of  oi, . ...  a,.  This 
exists,  and  we  write  d  =  GCD(«| , a,  )  for  it.  It  is  immediate  that  d  equals  the 
greatest  common  divisor  of  the  nonzero  members  of  the  set  {aj , . . . ,  at}.  Thus, 
in  deriving  properties  of  greatest  common  divisors,  we  may  assume  that  all  the 
integers  are  nonzero. 

Corollary  1.11.  Let  a\, . . . ,  at  be  positive  integers,  and  let  d  be  their  greatest 
common  divisor.  Then 

(a)  if  for  each  j  with  1  <  j  <  t,  cij  =  /> ,  '  ■  ■  •  p/'J  is  an  expansion  of  aj  as 
a  product  of  powers  of  r  distinct  primes  p\, ... ,  pr,  it  follows  that 

A  —  mini<;<dbj}  mim  <j<,{K,j} 

a  —  //j  '  '  '  Pr  > 

(b)  any  divisor  d'  of  all  of  «]....,  a,  necessarily  divides  d, 

(c)  d  =GCD(GCD(a1,...,a?_1),a,)  iff  >  1, 

(d)  there  exist  integers  x\,  ...,xt  such  that  a  \  x  \  +  •  •  •  +  a,x,  =  d. 

PROOF.  Part  (a)  is  proved  in  the  same  way  as  Corollary  1.8  except  that  Corollary 
1 .7  is  to  be  applied  r  times  rather  than  just  twice.  Further  application  of  Corollary 
1 .7  shows  that  any  positive  divisor  d'  of  a\ , . . . ,  at  is  of  the  form  d'  =  p" 1  •  •  •  p1"' 
with  mi  <  k\  j  for  all  j,  ...  ,  and  with  mr  <  krj  for  all  j .  Therefore  m\  < 
minj ■  ■  ■  ,  and  mr  <  minx <j<r{krj},  and  it  follows  that  d'  divides 
d.  This  proves  (b).  Conclusion  (c)  follows  by  using  the  formula  in  (a),  and  (d) 
follows  by  combining  (c).  Proposition  1.2c,  and  induction.  □ 
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3.  Unique  Factorization  of  Polynomials 

This  section  establishes  unique  factorization  for  ordinary  rational,  real,  and  com¬ 
plex  polynomials.  We  write  Q  for  the  set  of  rational  numbers,  R  for  the  set  of 
real  numbers,  and  C  for  the  set  of  complex  numbers,  each  with  its  arithmetic 
operations.  The  rational  numbers  are  constructed  from  the  integers  by  a  process 
reviewed  in  Section  A3  of  the  appendix,  the  real  numbers  are  defined  from  the 
rational  numbers  by  a  process  reviewed  in  that  same  section,  and  the  complex 
numbers  are  defined  from  the  real  numbers  by  a  process  reviewed  in  Section  A4 
of  the  appendix.  Sections  A3  and  A4  of  the  appendix  mention  special  properties 
of  R  and  C  beyond  those  of  the  arithmetic  operations,  but  we  shall  not  make 
serious  use  of  these  special  properties  here  until  nearly  the  end  of  the  section  — 
after  unique  factorization  of  polynomials  has  been  established.  Let  IF  denote  any 
of  Q,  R,  or  C.  The  members  of  F  are  called  scalars. 

We  work  with  ordinary  polynomials  with  coefficients  in  F.  Informally  these 
are  expressions  P(X)  =  anXn  +  -  ■  -+a\X+ao  with  an, . . . ,  a\,  ao  inF.  Although 
it  is  tempting  to  think  of  P(X)  as  a  function  with  independent  variable  X,  it  is 
better  to  identify  P  with  the  sequence  («o,  «i ,  •  •  • ,  an,  0,  0, . . . )  of  coefficients, 
using  expressions  P{X)  =  anX"  +  •  •  •  +  a\X  +  ao  only  for  conciseness  and  for 
motivation  of  the  definitions  of  various  operations. 

The  precise  definition  therefore  is  that  a  polynomial  in  one  indeterminate 
with  coefficients  in  F  is  an  infinite  sequence  of  members  of  F  such  that  all  terms 
of  the  sequence  are  0  from  some  point  on.  The  indexing  of  the  sequence  is  to  begin 
with  0.  We  may  refer  to  a  polynomial  P  as  P  ( X )  if  we  want  to  emphasize  that 
the  indeterminate  is  called  X.  Addition,  subtraction,  and  scalar  multiplication 
are  defined  in  coordinate-by-coordinate  fashion: 

(o0-  fli - -  a„,  0,  0 - )  +  (b0,bi - -  bn,  0,  0 - ) 

=  (a o  +  bo,  a\  +  b\, . . . ,  an  +  bn,  0,  0, . . . ), 

(atnoi - ,  o„,  0,  0 - )  -  (h0, hj - -  h„,  0,  0 - ) 

=  (n0  -  ho.O!  -  b\ - -  an  -  h„,  0,  0 - ), 

c(ao,  Oi, . . . ,  an,  0, 0, . . . )  =  (cao,  ca\, . . . ,  can ,  0,  0, . . . ). 

Polynomial  multiplication  is  defined  so  as  to  match  multiplication  of  expressions 
a„Xn  +  •  •  •  +  a  i  X  +  ao  if  the  product  is  expanded  out,  powers  of  X  are  added, 
and  then  terms  containing  like  powers  of  X  are  collected: 

(o0,oi, - 0,0, . . .  )(h0,  hi, - 0,0,...)  =  (c0,ci,  ...,0,0 - ), 

where  cn  =  Ylk=o  akbN-k-  We  take  it  as  known  that  the  usual  associative, 
commutative,  and  distributive  laws  are  then  valid.  The  set  of  all  polynomials  in 
the  indeterminate  X  is  denoted  by  F[  A  |. 
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The  polynomial  with  all  entries  0  is  denoted  by  0  and  is  called  the  zero 
polynomial.  For  all  polynomials  P  =  (ao, . . . ,  an,  0, . . . )  other  than  0,  the 
degree  of  P,  denoted  by  deg  P,  is  defined  to  be  the  largest  index  n  such  that 
an  7^  0.  The  constant  polynomials  are  by  definition  the  zero  polynomial  and  the 
polynomials  of  degree  0.  If  P  and  Q  are  nonzero  polynomials,  then 

P  +  Q  =  0  or  deg(P  +  Q)  <  max(deg  P,  deg  Q), 
degfc/5)  =  deg  P, 
deg (PQ)  =  deg  P  +  deg  Q. 

In  the  formula  for  deg ( P  +  Q),  equality  holds  if  deg  P  ^  deg  Q.  Implicit  in  the 
formula  for  deg(/*  Q )  is  the  fact  that  P  Q  cannot  be  0  unless  P  =  0  or  Q  =  0.  A 
cancellation  law  for  multiplication  is  an  immediate  consequence: 

PR  =  QR  with  R  /  0  implies  P  =  Q. 

In  fact,  PR  =  QR  implies  ( P  —  Q)R  =  0;  since  R  ^  0,  P  —  Q  must  be  0. 

If  P  =  {ao. ....  a„,  0, . . . )  is  a  polynomial  and  r  is  in  F,  we  can  evaluate  P 
at  r,  obtaining  as  a  result  the  number  P(r)  =  anrn  +  •  •  •  +  ct\r  +  ao-  Taking  into 
account  all  values  of  r,  we  obtain  a  mapping  P  i->  P(  ■  )  of  F|  X ]  into  the  set  of 
functions  from  F  into  F.  Because  of  the  way  that  the  arithmetic  operations  on 
polynomials  have  been  defined,  we  have 

(P  +  Q)(r)  =  P(r)  +  Q(r), 

(P  -  Q)(r)  =  P(r)  -  Q(r ), 

(cP)(r)  =  cP(r), 

(PQ)(r)  =  P(r)Q(r). 

In  other  words,  the  mapping  P  h->  P(  ■ )  respects  the  arithmetic  operations.  We 
say  that  r  is  a  root  of  P  if  P(r)  =  0. 

Now  we  turn  to  the  question  of  unique  factorization.  The  definitions  and  the 
proof  are  completely  analogous  to  those  for  the  integers.  A  factor  of  a  polynomial 
A  is  a  nonzero  polynomial  B  such  that  A  =  B  Q  for  some  polynomial  Q.  In 
this  case  we  say  also  that  B  divides  A,  that  B  is  a  divisor  of  A,  and  that  A  is  a 
multiple  of  B.  We  write  B  \  A  for  this  relationship.  If  A  is  nonzero,  any  product 
formula  A  =  B  Q\  ■  ■  ■  Qr  is  a  factorization  of  A.  A  unit  in  F[X]  is  a  divisor  of  1 , 
hence  is  any  polynomial  of  degree  0;  such  a  polynomial  is  a  constant  polynomial 
A{X)  =  c  with  c  equal  to  a  nonzero  scalar.  The  factorization  A  =  BQ  of 
A  ^  0  is  called  nontrivial  if  neither  B  nor  Q  is  a  unit.  A  prime  P  in  F[  X  ]  is  a 
nonzero  polynomial  that  is  not  a  unit  and  has  no  nontrivial  factorization  P  =  BQ. 
Observe  that  the  product  of  a  prime  and  a  unit  is  always  a  prime. 


3.  Unique  Factorization  of  Polynomials 


11 


Proposition  1.12  (division  algorithm).  If  A  and  B  are  polynomials  in  F[A] 
and  if  B  not  the  0  polynomial,  then  there  exist  unique  polynomials  Q  and  R  in 
F[X]  such  that 

(a)  A  =  B  Q  +  R  and 

(b)  either  R  is  the  0  polynomial  or  deg  R  <  deg  B. 

Remark.  This  result  codifies  the  usual  method  of  dividing  polynomials  in 
high-school  algebra.  That  method  writes  A/ B  =  Q  +  R/B,  and  then  one  obtains 
the  above  result  by  multiplying  by  B.  The  polynomial  Q  is  the  quotient  in  the 
division,  and  R  is  the  remainder. 

PROOF  of  UNIQUENESS.  If  A  =  BQ  +  R  =  BQX  +  Ru  then  B{Q  -  Q 0  = 
R\  —  R.  Without  loss  of  generality,  R\  —  R  is  not  the  0  polynomial  since  otherwise 
Q  —  Q\  =0  also.  Then 

deg  B  +  deg (0  -  Q\)  =  deg(Ri  —  R)  <  maxfdegR,  deg/?i)  <  degR, 

and  we  have  a  contradiction.  □ 

Proof  of  existence.  If  A  =  0  or  deg  A  <  deg  B ,  we  take  Q  =  0  and 
R  =  A,  and  we  are  done.  Otherwise  we  induct  on  deg  A.  Assume  the  result 
for  degree  <  n  —  1,  and  let  deg  A  =  n.  Write  A  =  a„Xn  +  A  i  with  Ai  =  0 
or  deg  A\  <  deg  A.  Let  B  =  bkXk  +  B\  with  B\  =  0  or  deg  B\  <  deg  B.  Put 
Q\  =anb^Xn~k.  Then 

A  —  B Qi  =  a„X"  +  A,  -  anXn  -  anb^X'l~kB{  =  A!  -  anb^Xn~kBx 

with  the  right  side  equal  to  0  or  of  degree  <  deg  A.  Then  the  right  side,  by 
induction,  is  of  the  form  BQ2  +  R,  and  A  =  B(Q  1  +  Q2)  +  R  is  the  required 
decomposition.  □ 

Corollary  1.13  (Factor  Theorem).  If  r  is  in  F  and  if  P  is  a  polynomial  in 
F[X],  then  X  —  r  divides  P  if  and  only  if  P(r)  =  0. 

PROOF.  If  P  =  (X  —  r)Q,  then  P ( r )  =  (r  —  r)Q(r)  =  0.  Conversely  let 
P(r)  =  0.  Taking  B(X)  =  X  —  r  in  the  division  algorithm  (Proposition  1.12), 
we  obtain  P  =  (X  —  r)Q  +  R  with  R  =  0  or  degR  <  deg(X  —  r)  =  1. 
Thus  R  is  a  constant  polynomial,  possibly  0.  In  any  case  we  have  0  =  Pip)  = 
(r  —  r)Q{r)  +  R(r),  and  thus  R(r)  =  0.  Since  R  is  constant,  we  must  have 
R  =  0,  and  then  P  =  {X  —  r)Q.  □ 

Corollary  1.14.  If  P  is  a  nonzero  polynomial  with  coefficients  in  F  and  if 
deg  P  =  n,  then  P  has  at  most  n  distinct  roots. 
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Remarks.  Since  there  are  infinitely  many  scalars  in  any  of  Q  and  M  and 
C,  the  corollary  implies  that  the  function  from  IF  to  F  associated  to  P,  namely 
r  f->  P  ( r ) ,  cannot  be  identically  0  if  P  7^  0.  Starting  in  Chapter  IV,  we  shall 
allow  other  IF’s  besides  Q  and  K  and  C,  and  then  this  implication  can  fail.  For 
example,  when  IF  is  the  two-element  “field”  IF  =  {0,  1}  with  1  +  1=0  and  with 
otherwise  the  expected  addition  and  multiplication,  then  P(X)  =  X2  +  X  is  not 
the  zero  polynomial  but  P(r)  =  0  for  r  =  0  and  r  =  1.  It  is  thus  important  to 
distinguish  polynomials  in  one  indeterminate  from  their  associated  functions  of 
one  variable. 

PROOF.  Let  r\, . . . ,  rn+ \  be  distinct  roots  of  P(X).  By  the  Factor  Theorem 
(Corollary  1.13),  X  —  r\  is  a  factor  of  P(X).  We  prove  inductively  on  k  that 
the  product  (X  —  n)(X  —  m)  ■■■  (X  —  rk)  is  a  factor  of  P(X ).  Assume  that  this 
assertion  holds  for  k,  so  that  P{X)  =  {X  —  /+ )  •  •  •  (X  —  rk)Q(X)  and 


0  =  P(rk+i)  =  (rk+]  -  n)  •  •  •  (rk+l  -  rk)Q(rk+ ,). 

Since  the  rf  s  are  distinct,  we  must  have  Q(rk+\ )  =  0.  By  the  Factor  Theorem, 
we  can  write  Q(X)  =  (X  —  rk+\)R(X)  for  some  polynomial  K(X).  Substitution 
gives  P(X)  =  (X  -  n)  •  •  •  (X  -  rk)(X  -  rk+1)R(X),  and  (X  -  n)  ■  ■  ■  (X  -  rk+l) 
is  exhibited  as  a  factor  of  P(X).  This  completes  the  induction.  Consequently 

P(X)  =  (X-rl)---(X-rn+l)S(X) 

for  some  polynomial  S(X).  Comparing  the  degrees  of  the  two  sides,  we  find  that 
deg  S  =  —  1 ,  and  we  have  a  contradiction.  □ 


We  can  use  the  division  algorithm  in  the  same  way  as  with  the  integers  in 
Sections  1-2  to  obtain  unique  factorization.  Within  the  set  of  integers,  we  defined 
greatest  common  divisors  so  as  to  be  positive,  but  their  negatives  would  have 
worked  equally  well.  That  flexibility  persists  with  polynomials;  the  essential 
feature  of  any  greatest  common  divisor  of  polynomials  is  shared  by  any  product 
of  that  polynomial  by  a  unit.  A  greatest  common  divisor  of  polynomials  A  and 
B  with  B  ^  0  is  any  polynomial  D  of  maximum  degree  such  that  D  divides  A 
and  D  divides  B.  We  shall  see  that  D  is  indeed  unique  up  to  multiplication  by  a 
nonzero  scalar.2 


2For  some  purposes  it  is  helpful  to  isolate  one  particular  greatest  common  divisor  by  taking  the 
coefficient  of  the  highest  power  of  X  to  be  1 . 
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The  Euclidean  algorithm  is  the  iterative  process  that  makes  use  of  the  division 
algorithm  in  the  form 


A 

=  BQ  1  +  R 1, 

Ri 

=  0 

or 

deg  f?, 

< 

deg  B. 

B 

=  ^162  +  Ri, 

R2-- 

=  0 

or 

deg  R2 

< 

deg  R\, 

Ri 

=  R2Q3  +  R3, 

R3-- 

=  0 

or 

deg  R3 

< 

deg  R2, 

Rn-  2 

=  Rn- 1  Qn  +  Rn - 

Rn  - 

=  0 

or 

deg  R„ 

< 

deg  Rn- 

Rn- 1 

=  Rn  Qn+ 1  - 

In  the  above  computation  the  integer  n  is  defined  by  the  conditions  that  R„  7^  0 
and  that  R„+\  =  0.  Such  an  n  must  exist  since  deg  B  >  deg  R\  >•••>().  We 
can  now  obtain  an  analog  for  F[X]  of  the  result  for  Z  given  as  Proposition  1.2. 

Proposition  1.15.  Let  A  and  B  be  polynomials  in  F[X]  with  6  /  0,  and  let 
R\, ,  Rnbe  the  remainders  generated  by  the  Euclidean  algorithm  when  applied 
to  A  and  B.  Then 

(a)  Rn  is  a  greatest  common  divisor  of  A  and  B , 

(b)  any  D\  that  divides  both  A  and  B  necessarily  divides  R„, 

(c)  the  greatest  common  divisor  of  A  and  B  is  unique  up  to  multiplication 
by  a  nonzero  scalar, 

(d)  any  greatest  common  divisor  D  has  the  property  that  there  exist  polyno¬ 
mials  P  and  Q  with  AP  +  B  Q  =  D. 

PROOF.  Conclusions  (a)  and  (b)  are  proved  in  the  same  way  that  parts  (a)  and 
(b)  of  Proposition  1.2  are  proved,  and  conclusion  (d)  is  proved  with  D  =  Rn  in 
the  same  way  that  Proposition  1.2c  is  proved. 

If  D  is  a  greatest  common  divisor  of  A  and  B,  it  follows  from  (a)  and  (b)  that 
D  divides  Rn  and  that  deg  D  =  deg  R„ .  This  proves  (c).  □ 

Using  Proposition  1.15,  we  can  prove  analogs  for  F[X]  of  the  two  corollaries 
of  Proposition  1.2.  But  let  us  instead  skip  directly  to  what  is  needed  to  obtain  an 
analog  for  F[X]  of  unique  factorization  as  in  Theorem  1.5. 

Lemma  1.16.  If  A  and  B  are  nonzero  polynomials  with  coefficients  in  F  and 
if  P  is  a  prime  polynomial  such  that  P  divides  AB ,  then  P  divides  A  or  P  divides 

B. 

PROOF.  If  P  does  not  divide  A ,  then  1  is  a  greatest  common  divisor  of  A  and 
P.  and  Proposition  1.1 5d  produces  polynomials  S  and  T  such  that  /LS'+  PT  =  1. 
Multiplication  by  B  gives  ABS  +  PT B  =  B .  Then  P  divides  ABS  because  it 
divides  AB.  and  P  divides  PT B  because  it  divides  P.  Hence  P  divides  B.  □ 
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Theorem  1.17  (unique  factorization).  Every  member  of  F[X]  of  degree  >  1  isa 
product  of  primes.  This  factorization  is  unique  up  to  order  and  up  to  multiplication 
of  each  prime  factor  by  a  unit,  i.e.,  by  a  nonzero  scalar. 

PROOF.  The  existence  follows  in  the  same  way  as  the  existence  in  Theorem 
1 .5;  induction  on  the  integers  is  to  be  replaced  by  induction  on  the  degree.  The 
uniqueness  follows  from  Lemma  1.16  in  the  same  way  that  the  uniqueness  in 
Theorem  1.5  follows  from  Lemma  1.6.  □ 

We  turn  to  a  consideration  of  properties  of  polynomials  that  take  into  account 
special  features  of  R  and  C.  If  F  is  R,  then  X2  +  1  is  prime.  The  reason  is  that 
a  nontrivial  factorization  of  X2  +  1  would  have  to  involve  two  first-degree  real 
polynomials  and  then  r  2  + 1  would  have  to  be  0  for  some  real  r ,  namely  for  r  equal 
to  the  root  of  either  of  the  first-degree  polynomials.  On  the  other  hand,  X2  +  1 
is  not  prime  when  F  =  C  since  X2  +  1  =  (X  +  i)(X  —  i).  The  Fundamental 
Theorem  of  Algebra,  stated  below,  implies  that  every  prime  polynomial  over  C  is 
of  degree  1 .  It  is  possible  to  prove  the  Fundamental  Theorem  of  Algebra  within 
complex  analysis  as  a  consequence  of  Liouville’s  Theorem  or  within  real  analysis 
as  a  consequence  of  the  Heine-Borel  Theorem  and  other  facts  about  compactness. 
This  text  gives  a  proof  of  the  Fundamental  Theorem  of  Algebra  in  Chapter  IX 
using  modern  algebra,  specifically  Sylow  theory  as  in  Chapter  IV  and  Galois 
theory  as  in  Chapter  IX.  One  further  fact  is  needed;  this  fact  uses  elementary 
calculus  and  is  proved  below  as  Proposition  1.20. 

Theorem  1.18  (Fundamental  Theorem  of  Algebra).  Any  polynomial  in  C[X] 
with  degree  >  1  has  at  least  one  root. 

Corollary  1.19.  Let  P  be  a  nonzero  polynomial  of  degree  n  in  C[X], 
and  let  n, . . . ,  r*  be  the  distinct  roots.  Then  there  exist  unique  integers  nij  >  0 
for  1  <  j  <  k  such  that  P(X)  is  a  scalar  multiple  of  ]~[j=1  (X  ~  rj •  The 
numbers  nij  have  mj  =  n- 

PROOF.  We  may  assume  that  deg  P  >  0.  We  apply  unique  factorization 
(Theorem  1.17)  to  P(X).  It  follows  from  the  Fundamental  Theorem  of  Algebra 
(Theorem  1.18)  and  the  Factor  Theorem  (Corollary  1.13)  that  each  prime  polyno¬ 
mial  with  coefficients  in  C  has  degree  1.  Thus  the  unique  factorization  of  P{X) 
has  to  be  of  the  form  c  j  (X  —  zi )  for  some  c  /  0  and  for  some  complex 
numbers  zi  that  are  unique  up  to  order.  The  Zi ’s  are  roots,  and  every  root  is  a  zi  by 
the  Factor  Theorem.  Grouping  like  factors  proves  the  desired  factorization  and 
its  uniqueness.  The  numbers  mj  have  Y^)=\  mj  =  n  by  a  count  of  degrees.  □ 

The  integers  nij  in  the  corollary  are  called  the  multiplicities  of  the  roots  of  the 
polynomial  P(X). 


4.  Permutations  and  Their  Signs 
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We  conclude  this  section  by  proving  the  result  from  calculus  that  will  enter 
the  proof  of  the  Fundamental  Theorem  of  Algebra  in  Chapter  IX. 

Proposition  1.20.  Any  polynomial  in  R[X]  with  odd  degree  has  at  least  one 
root. 

PROOF.  Without  loss  of  generality,  we  may  take  the  leading  coefficient  to 
be  1.  Thus  let  the  polynomial  be  P(X)  =  X2,,+  i  +  a2nX2'1  +  •  •  •  +  ct\X  +  ao  = 
X2n+1  +  R(X).  Since  limbic*,  P(x)/x2n+]  =  1,  there  is  some  positive  r o  such 
that  P(—ro)  <  0  and  >  0.  By  the  Intermediate  Value  Theorem,  given  in 
Section  A3  of  the  appendix,  P(r)  =  0  for  some  r  with  — ro  <  r  <  tq.  □ 


4.  Permutations  and  Their  Signs 


Let  S  be  a  finite  nonempty  set  of  n  elements.  A  permutation  of  S  is  a  one-one 
function  from  S  onto  S.  The  elements  might  be  listed  as  a.\,  a2,  ■  ■ . ,  an,  but  it 
will  simplify  the  notation  to  view  them  simply  as  1,2 We  use  ordinary 
function  notation  for  describing  the  effect  of  permutations.  Thus  the  value  of  a 
permutation  a  at  j  is  ct(j),  and  the  composition  of  r  followed  by  ct  is  cr  o  r  or 
simply  a r,  with  (err )(_/)  =  a(r(j)).  Composition  is  automatically  associative, 
i.e.,  (per) r  =  p(crr),  because  the  effect  of  both  sides  on  j,  when  we  expand 
things  out,  is  p(cr(r(j))).  The  composition  of  two  permutations  is  also  called 
their  product. 

The  identity  permutation  will  be  denoted  by  1 .  Any  permutation  cr ,  being 
a  one-one  onto  function,  has  a  well-defined  inverse  permutation  cr-1  with  the 
property  that  ctct-1  =  cr  1  rr  =  1.  One  way  of  describing  concisely  the  effect 
of  a  permutation  is  to  list  its  domain  values  and  to  put  the  corresponding  range 


values  beneath  them.  Thus  a  = 


12345 
435  12 


is  the  permutation  of  {1,  2,  3,  4,  5} 


with  cr(l)  =  4,  cr(2)  =  3,  cr(3)  =  5,  cr(4)  =  1,  and  cr (5)  =  2.  The  inverse 

/435 1 2\ 

permutation  is  obtained  by  interchanging  the  two  rows  to  obtain  I  1  and 


then  adjusting  the  entries  in  the  rows  so  that  the  first  row  is  in  the  usual  order: 


a 


-l 


12345 

45213 


If  2  <  k  <  n,  a  f  -cycle  is  a  permutation  a  that  fixes  each  element  in  some 
subset  of  n  —  k  elements  and  moves  the  remaining  elements  c  i , . . . .  c>  according 
tocr(ci)  =  C2,  cr(c2)  =  C3, . . .  ,cr(c*_i)  =  q,  ct(q)  =  cj.  Such  a  cycle  may  be 
denoted  by  (ci  C2  •  •  •  ('k-  \  ry  )  to  stress  its  structure.  For  example  take  n  =  5; 

then  a  =  (2  3  5)  is  the  3-cycle  given  in  our  earlier  notation  by 


/ 1 2  3  45  \ 
y  1  3542  )' 
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The  cycle  (2  3  5)  is  the  same  as  the  cycle  (3  5  2)  and  the  cycle  (5  2  3).  It  is 
sometimes  helpful  to  speak  of  the  identity  permutation  1  as  the  unique  1 -cycle. 

A  system  of  cycles  is  said  to  be  disjoint  if  the  sets  that  each  of  them  moves 
are  disjoint  in  pairs.  Thus  (2  3  5)  and  (14)  are  disjoint,  but  (2  3  5)  and  (13) 
are  not.  Any  two  disjoint  cycles  a  and  r  commute  in  the  sense  that  ax  =  r a. 

Proposition  1.21,  Any  permutation  a  of  {1,  2, ...,«}  is  a  product  of  disjoint 
cycles.  The  individual  cycles  in  the  decomposition  are  unique  in  the  sense  of 
being  determined  by  a . 

( 1  2345\ 

Example.  (435  jo  J  =  (2  3  5)(1  4)- 

PROOF.  Let  us  prove  existence.  Working  with  {1,2,...,  «},  we  show  that  any 
a  is  the  disjoint  product  of  cycles  in  such  a  way  that  no  cycle  moves  an  element 
j  unless  a  moves  j.  We  do  so  for  all  a  simultaneously  by  induction  downward 
on  the  number  of  elements  fixed  by  a .  The  starting  case  of  the  induction  is  that 
a  fixes  all  n  elements.  Then  a  is  the  identity,  and  we  are  regarding  the  identity 
as  a  1 -cycle. 

For  the  inductive  step  suppose  a  fixes  the  elements  in  a  subset  T  of  r  el¬ 
ements  of  {1,2 ,...,«}  with  r  <  n.  Let  j  be  an  element  not  in  T,  so  that 
a{j)  /  j.  Choose  k  as  small  as  possible  so  that  some  element  is  repeated 
among  j,  a(j),  a2(j),  . . . ,  ak(j).  This  condition  means  that  a'{j)  =  ak(j )  for 
some  1  with  0  <  /  <  k.  Then  ak~  (j)  =  j ,  and  we  obtain  a  contradiction  to 
the  minimality  of  k  unless  k  —  l  =  k,  i.e.,  1=0.  In  other  words,  we  have 
ak(j )  =  j.  We  may  thus  form  the  k-cycle  y  =  (j  a(j)  a2(j)  ak~1(j)).  The 
permutation  y~la  then  fixes  the  r  +  k  elements  of  '/'  IJ  (/,  where  U  is  the  set  of 
elements  j,  a(j ),  a2(j),  . . . ,  ak~l(j).  By  the  inductive  hypothesis,  y~xa  is  the 
product  n  •  •  •  xp  of  disjoint  cycles  that  move  only  elements  not  in  T  U  U.  Since 
y  moves  only  the  elements  in  U ,  y  is  disjoint  from  each  of  n,  . . . ,  xp.  Therefore 
a  =  yx\  ■  ■  ■  xp  provides  the  required  decomposition  of  a. 

For  uniqueness  we  observe  from  the  proof  of  existence  that  each  element 
j  generates  a  k-cycle  C,  for  some  k  >  I  depending  on  j.  If  we  have  two 
decompositions  as  in  the  proposition,  then  the  cycle  within  each  decomposition 
that  contains  j  must  be  C, .  Hence  the  cycles  in  the  two  decompositions  must 
match.  □ 

A  2-cycle  is  often  called  a  transposition.  The  proposition  allows  us  to  see 
quickly  that  any  permutation  is  a  product  of  transpositions. 

Corollary  1.22.  Any  A’-cycle  a  permuting  {1.2,..../;}  is  a  product  of  k  —  1 
transpositions  if  k  >  1 .  Therefore  any  permutation  a  of  { 1 ,  2, ...,/;}  is  a  product 
of  transpositions. 


4.  Permutations  and  Their  Signs 
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Proof.  For  the  first  statement,  we  observe  that  (ci  C2  •  •  •  q_i  q)  = 
(cj  q)(ci  Q-i)  •  •  •  (ci  C3) (cj  C2).  The  second  statement  follows  by  combining 
this  fact  with  Proposition  1 .21 .  □ 

Our  final  tasks  for  this  section  are  to  attach  a  sign  to  each  permutation  and  to 
examine  the  properties  of  these  signs.  We  begin  with  the  special  case  that  our 
underlying  set  S  is  {1, . . . ,  n).  If  a  is  a  permutation  of  {1, ...,«},  consider  the 
numerical  products 

Y[  I cr(k)-a(j)\  and  ]""[  (a (k)  -  a (j)). 

1 <j<k<n  1 <j<k<n 

If  (r,  s)  is  any  pair  of  integers  with  1  <  r  <  s  <  n,  then  the  expression  s  —  r 
appears  once  and  only  once  as  a  factor  in  the  first  product.  Therefore  the  first 
product  is  independent  of  a  and  equals  W\<j<il<n  (k  —  j ).  Meanwhile,  each 
factor  of  the  second  product  is  ±1  times  the  corresponding  factor  of  the  first 
product.  Therefore  we  have 

]”[  (<r(k)-aU))  =  ( sgncr)  ]~[  (k  - ./), 

1 <j<k<n  1 <j<k<n 

where  sgn  a  is  + 1  or  —  1 ,  depending  on  a .  This  sign  is  called  the  sign  of  the 
permutation  a. 

Lemma  1.23.  Let  a  be  a  permutation  of  { I .  let  (a  b)  be  a  transposition, 

and  form  the  product  a  (a  b).  Then  sgn  (a  (a  b))  =  —  sgn  a. 

PROOF.  For  the  pairs  (  /,  k)  with  j  <  k,  we  are  to  compare  rr(k)  —  cr{j)  with 
a  (a  b)(k)  —  a  {a  b)(j).  There  are  five  cases.  Without  loss  of  generality,  we 
may  assume  that  a  <  b. 

Case  1.  If  neither  j  nor  k  equals  a  or  b,  then  a  (a  b)(k)  —  a  (a  b){j)  = 
cr{k)  —  cr(  j).  Thus  such  pairs  (  /.  k)  make  the  same  contribution  to  the  product 
for  a  ( a  b)  as  to  the  product  for  a ,  and  they  can  be  ignored. 

Case  2.  If  one  of  j  and  k  equals  one  of  a  and  b  while  the  other  does  not,  there 
are  three  situations  of  interest.  For  each  we  compare  the  contributions  of  two  such 
pairs  together.  The  first  situation  is  that  of  pairs  (a,  t)  and  (t.  b)  with  a  <  t  <  b. 
These  together  contribute  the  factors  (cr(f)  —  ct(a))  and  (a(b)  —  a{t))  to  the 
product  for  a ,  and  they  contribute  the  factors  (cr(t)  —  crib))  and  (cr(a)  —  o(t)) 
to  the  product  for  a  (a  b).  Since 

(a(t)  -  a(a))(a(b)  -  a(t))  =  ( a(t )  -  o{b)){o{a)  -  a(t)), 

the  pairs  together  make  the  same  contribution  to  the  product  for  a  (a  b)  as  to  the 
product  for  a,  and  they  can  be  ignored. 
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Case  3.  Continuing  with  matters  as  in  Case  2,  we  next  consider  pairs  (a ,  t )  and 
(. b ,  t)  with  a  <  b  <  t.  These  together  contribute  the  factors  (ait)  —  a(a))  and 
(. er(f )  —  a(b))  to  the  product  for  a,  and  they  contribute  the  factors  (a(t)  —  a(b)) 
and  (a  (t)  —  a  (a  ))  to  the  product  for  a  (a  b).  Since 

(a(t)  -  a(a)){a[t)  -  a{b))  =  (a(t)  -  a(b))(a(t)  -  a  (a)), 

the  pairs  together  make  the  same  contribution  to  the  product  for  a  (a  b )  as  to  the 
product  for  a,  and  they  can  be  ignored. 

Case  4.  Still  with  matters  as  in  Case  2,  we  consider  pairs  (t,  a)  and  (t ,  b)  with 
t  <  a  <  b.  Arguing  as  in  Case  3,  we  are  led  to  an  equality 

(o' (a)  -  cr(t))(cr(b)  -  a (f))  =  (a(b)  -  o(t))(a(a)  -  a{t)), 
and  these  pairs  can  be  ignored. 

Case  5.  Finally  we  consider  the  pair  (a,  b)  itself.  It  contributes  a(b)  —  a  (a) 
to  the  product  for  a,  and  it  contributes  a  (a)  —  a(b)  to  the  product  for  a  (a  b). 
These  are  negatives  of  one  another,  and  we  get  a  net  contribution  of  one  minus 
sign  in  comparing  our  two  product  formulas.  The  lemma  follows.  □ 

Proposition  1.24.  The  signs  of  permutations  of  {1,  2,  . . . ,  n}  have  the  follow¬ 
ing  properties: 

(a)  sgn  1  =  +1, 

(b)  sgncr  =  { —  I ) A  if  a  can  be  written  as  the  product  of  k  transpositions, 

(c)  sgn(crr)  =  (sgn ct) (sgn  r), 

(d)  sgn(cr-1)  =  sgncr. 

PROOF.  Conclusion  (a)  is  immediate  from  the  definition.  For  (b),  let  a  = 
t\  ■  ■  ■  %k  with  each  r,  equal  to  a  transposition.  We  apply  Lemma  1 .23  recursively, 
using  (a)  at  the  end: 


sgn(n  •  •  •  r*)  =  (-1)  sgnfrj  •  •  •  rk-i)  =  (-1)2  sgnfr]  •  •  •  rk-2) 

=  ...  =  (— l)*"1  sgnn  =  (— l/'sgnl  =  (-1)*. 

For  (c),  Corollary  1 .22  shows  that  any  permutation  is  the  product  of  transpositions. 
If  a  is  the  product  of  k  transpositions  and  r  is  the  product  of  /  transpositions,  then 
err  is  manifestly  the  product  of  k  +  /  transpositions.  Thus  (c)  follows  from  (b). 
Finally  (d)  follows  from  (c)  and  (a)  by  taking  r  =  er_1 .  □ 

Our  discussion  of  signs  has  so  far  attached  signs  only  to  permutations  of 
S  =  {1,2,  . . . ,  n } .  If  we  are  given  some  other  set  S'  of  n  elements  and  we  want  to 
adapt  our  discussion  of  signs  so  that  it  applies  to  permutations  of  S',  we  need 
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to  identify  S  with  S',  say  by  a  one-one  onto  function  <p  :  S  —*■  S'.  If  a  is  a 
permutation  of  S',  then  tp~^  a cp  is  a  permutation  of  S,  and  we  can  define  sgn  (a)  = 
sgn(<p-1cnp).  The  question  is  whether  this  definition  is  independent  of  (p. 

Fortunately  the  answer  is  yes,  and  the  proof  is  easy.  Suppose  that  \fr  :  S  — >  S' 
is  a  second  one-one  onto  function,  so  that  sgn^fcr)  =  sgn(i/r-1cn fr).  Then 
c p~l\[r  =  r  is  a  permutation  of  {1,  2 and  (c)  and  (d)  in  Proposition  1.24 
give 

sgn^for)  =  sgn(xl/~'atfr)  =  sgn 

=  sgn(r-1)  sgn(<p_1cr<p)  sgn(r)  =  sgn(r)  sgn^ffi)  sgn(r)  =  sgn^fa). 

Consequently  the  dehnition  of  signs  of  permutations  of  {1.2,..../;}  can  be 
carried  over  to  give  a  definition  of  signs  of  permutations  of  any  finite  nonempty  set 
of  n  elements,  and  the  resulting  signs  are  independent  of  the  way  we  enumerate 
the  set.  The  conclusions  of  Proposition  1.24  are  valid  for  this  extended  definition 
of  signs  of  permutations. 


5.  Row  Reduction 

This  section  and  the  next  review  row  reduction  and  matrix  algebra  for  rational, 
real,  and  complex  matrices.  As  in  Section  3  let  F  denote  Q  or  1  or  C.  The 
members  of  F  are  called  scalars. 

The  term  “row  reduction”  refers  to  the  main  part  of  the  algorithm  used  for 
solving  simultaneous  systems  of  algebraic  linear  equations  with  coefficients  in 
F.  Such  a  system  is  of  the  form 


a\\x\  +  ai2X2  -I - t-  a\nxn  =  b\. 


ttk\X\  +  a^Xo  +  •  •  •  +  0-knxn  ~  b k. 


where  the  and  /;,  are  known  scalars  and  the  xj  are  the  unknowns,  or  variables. 
The  algorithm  makes  repeated  use  of  three  operations  on  the  equations,  each  of 

which  preserves  the  set  of  solutions  (x\ , . xn )  because  its  inverse  is  an  operation 

of  the  same  kind; 

(i)  interchange  two  equations, 

(ii)  multiply  an  equation  by  a  nonzero  scalar, 

(iii)  replace  an  equation  by  the  sum  of  it  and  a  multiple  of  some  other  equation. 
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The  repeated  writing  of  the  variables  in  carrying  out  these  steps  is  tedious  and 
unnecessary,  since  the  steps  affect  only  the  known  coefficients.  Instead,  we  can 
simply  work  with  an  array  of  the  form 


/  an 

<312 

•  G\n 

'■  ) 

V  ak  i 

an 

&kn 

bk  / 

The  individual  scalars  appearing  in  the  array  are  called  entries.  The  above 
operations  on  equations  correspond  exactly  to  operations  on  the  rows3  of  the 
array,  and  they  become 

(i)  interchange  two  rows, 

(ii)  multiply  a  row  by  a  nonzero  scalar, 

(iii)  replace  a  row  by  the  sum  of  it  and  a  multiple  of  some  other  row. 

Any  operation  of  these  types  is  called  an  elementary  row  operation.  The  vertical 
line  in  the  array  is  handy  from  one  point  of  view  in  that  it  separates  the  left  sides 
of  the  equations  from  the  right  sides;  if  we  have  more  than  one  set  of  right  sides, 
we  can  include  all  of  them  to  the  right  of  the  vertical  line  and  thereby  solve  all 
the  systems  at  the  same  time.  But  from  another  point  of  view,  the  vertical  line  is 
unnecessary  since  it  does  not  affect  which  operation  we  perform  at  a  particular 
time.  Let  us  therefore  drop  it,  abbreviating  the  system  as 


a  ii 

<312 

•  ®ln 

ak  l 

ak  2 

tt/cn 

b\ 


bk 


The  main  step  in  solving  the  system  is  to  apply  the  three  operations  in  succes¬ 
sion  to  the  array  to  reduce  it  to  a  particularly  simple  form.  An  array  with  k  rows 
and  m  columns4  is  in  reduced  row-echelon  form  if  it  meets  several  conditions: 

•  Each  member  of  the  first  l  of  the  rows,  for  some  1  with  0  <  Z  <  k,  has  at 
least  one  nonzero  entry,  and  the  other  rows  have  all  entries  0. 

•  Each  of  the  nonzero  rows  has  1  as  its  first  nonzero  entry;  let  us  say  that 
the  ;th  nonzero  row  has  this  1  in  its  /  (i)th  entry. 

•  The  integers  j  (i)  are  to  be  strictly  increasing  as  a  function  of  i,  and  the 
only  entry  in  the  j  (t)th  column  that  is  nonzero  is  to  be  the  one  in  the  ;lh 
row. 


Proposition  1,25.  Any  array  with  k  rows  and  m  columns  can  be  transformed 
into  reduced  row-echelon  form  by  a  succession  of  steps  of  types  (i),  (ii),  (iii). 


3  “Rows”  are  understood  to  be  horizontal,  while  “columns”  are  vertical. 

4In  the  above  displayed  matrix,  the  array  has  m  =  n  +  1  columns. 
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In  fact,  the  transformation  in  the  proposition  is  carried  out  by  an  algorithm 
known  as  the  method  of  row  reduction  of  the  array.  Let  us  begin  with  an 
example,  indicating  the  particular  operation  at  each  stage  by  a  label  over  an  arrow 
i-> .  To  keep  the  example  from  being  unwieldy,  we  consolidate  steps  of  type  (iii) 
into  a  single  step  when  the  “other  row”  is  the  same. 


Example.  In  this  example,  k  =  m  =  4.  Row  reduction  gives 

/  0  0  2  7\  /  1  -1  1  1\  /I  -1  1 

1-1  1  1  I  (h  I  0  0  2  7  1  (iii)  [0  0  2 

-1  1  -4  5  -1  1-45^0  0-3 

V  —2  2  -5  4/  V  —2  2  -5  4/  \0  0  -3 


(ii) 


(l  -1  1 

0  0  1  | 

0  0-3  6 

\0  0  -3  6/ 


(iii) 
i — '> 


/I  -1  0  0\ 

(iii)  0  0  10 

^  0  0  0  1' 

Vo  000/ 


/ 1  —  1  0 

0  0  1 

0  0  0 

Vo  0  0 


/I  -1  0 

0  0  1 
0  0  0 
Vo  0  0 


The  final  matrix  here  is  in  reduced  row-echelon  form.  In  the  notation  of  the 


definition,  the  number  of  nonzero  rows  in  the  reduced  row-echelon  form  is  /  =  3, 
and  the  integers  j (i )  are  j(  1)  =  1,  j  (2)  =  3,  and  j( 3)  =  4. 


The  example  makes  clear  what  the  algorithm  is  that  proves  Proposition  1.25. 
We  find  the  first  nonzero  column,  apply  an  interchange  (an  operation  of  type  (i)) 
if  necessary  to  make  the  first  entry  in  the  column  nonzero,  multiply  by  a  nonzero 
scalar  to  make  the  first  entry  1  (an  operation  of  type  (ii)),  and  apply  operations  of 
type  (iii)  to  eliminate  the  other  nonzero  entries  in  the  column.  Then  we  look  for 
the  next  column  with  a  nonzero  entry  in  entries  2  and  later,  interchange  to  get  the 
nonzero  entry  into  entry  2  of  the  column,  multiply  to  make  the  entry  1 ,  and  apply 
operations  of  type  (iii)  to  eliminate  the  other  entries  in  the  column.  Continuing 
in  this  way,  we  arrive  at  reduced  row-echelon  form. 

In  the  general  case,  as  soon  as  our  array,  which  contains  both  sides  of  our  system 
of  equations,  has  been  transformed  into  reduced  row-echelon  form,  we  can  read 
off  exactly  what  the  solutions  are.  It  will  be  handy  to  distinguish  two  kinds  of 
variables  among  x\, ...  ,xn  without  including  any  added  variables  xn+\ , ... ,  xm 
in  either  of  the  classes.  The  corner  variables  are  those  xj  "s  for  which  j  is  <  n  and 
is  some  j  (i)  in  the  definition  of  “reduced  row-echelon  form,”  and  the  other  xf  s 
with  j  <  n  will  be  called  independent  variables.  Let  us  describe  the  last  steps 
of  the  solution  technique  in  the  setting  of  an  example.  We  restore  the  vertical  line 
that  separated  the  data  on  the  two  sides  of  the  equations. 
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Example.  We  consider  what  might  happen  to  a  certain  system  of  4  equations 
in  4  unknowns.  Putting  the  data  in  place  for  the  right  side  makes  the  array  have  4 
rows  and  5  columns.  We  transform  the  array  into  reduced  row-echelon  form  and 
suppose  that  it  comes  out  to  be 


/  1  -1  0  0 

0  0  10 

0  0  0  1 

\  0  0  0  0 


1  \ 
2 

3 

1  or  0  / 


If  the  lower  right  entry  is  1 ,  there  are  no  solutions.  In  fact,  the  last  row  corresponds 
to  an  equation  0=1,  which  announces  a  contradiction.  More  generally,  if  any 
row  of  0’s  to  the  left  of  the  vertical  line  is  equal  to  something  nonzero,  there  are 
no  solutions.  In  other  words,  there  are  no  solutions  to  a  system  if  the  reduced 
row-echelon  form  of  the  entire  array  has  more  nonzero  rows  than  the  reduced 
row-echelon  form  of  the  part  of  the  array  to  the  left  of  the  vertical  line. 

On  the  other  hand,  if  the  lower  right  entry  is  0,  then  there  are  solutions.  To  see 
this,  we  restore  the  reduced  array  to  a  system  of  equations: 


xi-x2  =  1, 

x3  =  2, 

X4  =  3; 


we  move  the  independent  variables  (namely  x2  here)  to  the  right  side  to  obtain 


*1  =  1+  x2, 

*3  =  2, 

X\  =  3; 

and  we  collect  everything  in  a  tidy  fashion  as 

/xi\ 

X2 
*3 

\*4  / 

The  independent  variables  are  allowed  to  take  on  arbitrary  values,  and  we  have 
succeeded  in  giving  a  formula  for  the  solution  that  corresponds  to  an  arbitrary  set 
of  values  for  the  independent  variables. 

The  method  in  the  above  example  works  completely  generally.  We  obtain 
solutions  whenever  each  row  of  0’s  to  the  left  of  the  vertical  line  is  matched  by 
a  0  on  the  right  side,  and  we  obtain  no  solutions  otherwise.  In  the  case  that  we  are 


fl\ 

0 

+  x2 

1 

2 

0 

\  3  / 

\o/ 
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solving  several  systems  with  the  same  left  sides,  solutions  exist  for  each  of  the 
systems  if  the  reduced  row-echelon  form  of  the  entire  array  has  the  same  number 
of  nonzero  rows  as  the  reduced  row-echelon  form  of  the  part  of  the  array  to  the 
left  of  the  vertical  line. 

Let  us  record  some  observations  about  the  method  for  solving  systems  of  linear 
equations  and  then  some  observations  about  the  method  of  row  reduction  itself. 

Proposition  1.26.  In  the  solution  process  for  a  system  of  k  linear  equations  in 
n  variables  with  the  vertical  line  in  place, 

(a)  the  sum  of  the  number  of  corner  variables  and  the  number  of  independent 
variables  is  n, 

(b)  the  number  of  corner  variables  equals  the  number  of  nonzero  rows  on  the 
left  side  of  the  vertical  line  and  hence  is  <  k , 

(c)  when  solutions  exist,  they  are  of  the  form 

.  independent  .  independent  . 

column  +  ■  ,  ,  x  column  +  •  •  •  +  .  ,  ,  x  column 

variable  variable 

in  such  a  way  that  each  independent  variable  xj  is  a  free  parameter  in  F, 
the  column  multiplying  xj  has  a  1  in  its  jth  entry,  and  the  other  columns 
have  a  0  in  that  entry, 

(d)  a  homogeneous  system,  i.e.,  one  with  all  right  sides  equal  to  0,  has 
a  nonzero  solution  if  the  number  k  of  equations  is  <  the  number  n  of 
variables, 

(e)  the  solutions  of  an  inhomogeneous  system,  i.e.,  one  in  which  the  right 
sides  are  not  necessarily  all  0,  are  all  given  by  the  sum  of  any  one  particular 
solution  and  an  arbitrary  solution  of  the  corresponding  homogeneous 
system. 

PROOF.  Conclusions  (a),  (b),  and  (c)  follow  immediately  by  inspection  of 
the  solution  method.  For  (d),  we  observe  that  no  contradictory  equation  can 
arise  when  the  right  sides  are  0  and,  in  addition,  that  there  must  be  at  least  one 
independent  variable  by  (a)  since  (b)  shows  that  the  number  of  corner  variables 
is  <  k  <  n.  Conclusion  (e)  is  apparent  from  (c),  since  the  first  column  in  the 
solution  written  in  (c)  is  a  column  of  0's  in  the  homogeneous  case.  □ 

Proposition  1.27.  For  an  array  with  k  rows  and  n  columns  in  reduced  row- 
echelon  form, 

(a)  the  sum  of  the  number  of  corner  variables  and  the  number  of  independent 
variables  is  n, 

(b)  the  number  of  corner  variables  equals  the  number  of  nonzero  rows  and 
hence  is  <  k. 
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(c)  when  k  =  n,  either  the  array  is  of  the  form 


/l 

0 

0 

...  0\ 

0 

1 

0 

•  •  •  0 

0 

0 

1 

•  •  •  0 

Vo 

0 

0 

...  1  / 

or  else  it  has  a  row  of  0's. 

PROOF.  Conclusions  (a)  and  (b)  are  immediate  by  inspection.  In  (c),  failure  of 
the  reduced  row-echelon  form  to  be  as  indicated  forces  there  to  be  some  noncorner 
variable,  so  that  the  number  of  corner  variables  is  <  n.  By  (b),  the  number  of 
nonzero  rows  is  <  n,  and  hence  there  is  a  row  of  0’s.  □ 

One  final  comment:  For  the  special  case  of  n  equations  in  n  variables,  some 
readers  may  be  familiar  with  a  formula  known  as  ‘‘Cramer’s  rule”  for  using 
determinants  to  solve  the  system  when  the  determinant  of  the  array  of  coefficients 
on  the  left  side  of  the  vertical  line  is  nonzero.  Determinants,  including  their 
evaluation,  and  Cramer’s  rule  will  be  discussed  in  Chapter  II.  The  point  to  make 
for  current  purposes  is  that  the  use  of  Cramer’s  rule  for  computation  is,  for  n 
large,  normally  a  more  lengthy  process  than  the  method  of  row  reduction.  In  fact. 
Problem  13  at  the  end  of  this  chapter  shows  that  the  number  of  steps  for  solving 
the  system  via  row  reduction  is  at  most  a  certain  multiple  of  n3.  On  the  other 
hand,  the  typical  number  of  steps  for  solving  the  system  by  rote  application  of 
Cramer’s  rule  is  approximately  a  multiple  of  n 4. 


6.  Matrix  Operations 

A  rectangular  array  of  scalars  (i.e.,  members  of  F)  with  k  rows  and  n  columns 
is  called  a  k-by-n  matrix.  More  precisely  a  k-hy-n  matrix  over  F  is  a  function 
from  {1.  . . . ,  k]  x  {1,  . . . ,  n}  to  F.  The  expression  “k-by-n”  is  called  the  size  of 
the  matrix.  The  value  of  the  function  at  the  ordered  pair  (/,  j)  is  often  indicated 
with  subscript  notation,  such  as  a;;-,  rather  than  with  the  usual  function  notation 
a(i,  j).  It  is  called  the  (i,  /)lh  entry.  Two  matrices  are  equal  if  they  are  the 
same  function  on  ordered  pairs;  this  means  that  they  have  the  same  size  and  their 
corresponding  entries  are  equal.  A  matrix  is  called  square  if  its  number  of  rows 
equals  its  number  of  columns.  A  square  matrix  with  all  entries  0  for  i  /  j  is 
called  diagonal,  and  the  entries  with  i  =  j  are  the  diagonal  entries. 

As  the  reader  likely  already  knows,  it  is  customary  to  write  matrices  in  rectan¬ 
gular  patterns.  By  convention  the  first  index  always  tells  the  number  of  the  row 
and  the  second  index  tells  the  number  of  the  column.  Thus  a  typical  2-by-3  matrix 
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i s  (  a  1 1  a  1 2  ° 1 3  ) .  In  the  indication  of  the  size  of  the  matrix,  here  2-by-3 ,  the  2 

\a2\  a22  an) 

refers  to  the  number  of  rows  and  the  3  refers  to  the  number  of  columns. 

An  n -dimensional  row  vector  is  a  l-by-«  matrix,  while  a  ^-dimensional 
column  vector  is  a  k-by- 1  matrix.  The  set  of  all  A -dimensional  column  vectors 
is  denoted  by  Fk.  The  set  F*  is  to  be  regarded  as  the  space  of  all  ordinary  garden- 
variety  vectors.  For  economy  of  space,  books  often  write  such  vectors  horizontally 
with  entries  separated  by  commas,  for  example  as  (ci ,  C2,  C3),  and  it  is  extremely 
important  to  treat  such  vectors  as  column  vectors,  not  as  row  vectors,  in  order 
to  get  matrix  operations  and  the  effect  of  linear  transformations  to  correspond 
nicely.5  Thus  in  this  book,  (c  1 ,  C2,  C3)  is  to  be  regarded  as  a  space-saving  way  of 

fc' 

writing  the  column  vector  I  c2 

\C3 

If  a  matrix  is  denoted  by  some  letter  like  A,  its  (i.  j)th  entry  will  typically  be 
denoted  by  A(/.  In  the  reverse  direction,  sometimes  a  matrix  is  assembled  from 
its  individual  entries,  which  may  be  expressions  depending  on  i  and  j.  If  some 
such  expression  is  given  for  each  pair  (i,  j ),  then  we  denote  the  corresponding 
matrix  by  . *  ,  or  simply  by  [a,  7  ]  if  there  is  no  possibility  of  confusion. 

j=l,...,n 

Various  operations  are  defined  on  matrices.  Specifically  let  Mkn(  F)  be  the 
set  of  A-by-n  matrices  with  entries  in  F,  so  that  Mk  1  (F)  is  the  same  thing  as  F*. 
Addition  of  matrices  is  defined  whenever  two  matrices  have  the  same  size,  and  it 
is  defined  entry  by  entry;  thus  if  A  and  B  are  in  Mkn  (F),  then  A+  B  is  the  member 
of  Mk„(F)  with  (A  +  B)ij  =  Ajj  +  B, ; .  Scalar  multiplication  on  matrices  is 
defined  entry  by  entry  as  well;  thus  if  A  is  in  Mkn( F)  and  c  is  in  F,  then  cA  is 
the  member  of  Mkn (F)  with  (cA)jj  =  c A,  j.  The  matrix  ( —  1)  A  is  denoted  by 
—  A.  The  k-by-n  matrix  with  0  in  each  entry  is  called  a  zero  matrix.  Ordinarily 
it  is  denoted  simply  by  0;  if  some  confusion  is  possible  in  a  particular  situation, 
more  precise  notation  will  be  introduced  at  the  time.  With  these  operations  the 
set  Mj<n  (F )  has  the  following  properties: 

(i)  the  operation  of  addition  satisfies 

(a)  A  +  {B  +  C)  =  (A  +  B)  +  C  for  all  A,  B,  C  in  Mkn(. F)  (associative 
law), 

(b)  A  +  0  =  0  +  A  =  A  for  all  A  in  Mkn (F), 

(c)  A  +  (—A)  =  (—A)  +  A  =  0  for  all  A  in  Mkn( F), 

(d)  A  +  B  =  B  +  A  for  all  A  and  B  in  Mkn( F)  (commutative  law); 


5  The  alternatives  are  unpleasant.  Either  one  is  forced  to  write  certain  functions  in  the  unnatural 
notation  x  (x)f ,  or  the  correspondence  is  forced  to  involve  transpose  operations  on  frequent 

occasions.  Unhappily,  books  following  either  of  these  alternative  conventions  may  be  found. 
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(ii)  the  operation  of  scalar  multiplication  satisfies 

(a)  {cd)A  =  c(dA)  for  all  A  in  M^n(F)  and  all  scalars  c  and  d, 

(b)  1 A  =  A  for  all  A  in  M^n  (IF)  and  for  the  scalar  1 ; 

(iii)  the  two  operations  are  related  by  the  distributive  laws 

(a)  c(A  +  B)  =  cA  +  cB  for  all  A  and  B  in  M\,n (F)  and  for  all  scalars  c, 

(b)  (c  +  d)A  =  cA  +  d A  for  all  A  in  M^n  (IF)  and  all  scalars  c  and  d. 

Since  addition  and  scalar  multiplication  are  defined  entry  by  entry,  all  of  these 
identities  follow  from  the  corresponding  identities  for  members  of  F. 

Multiplication  of  matrices  is  defined  in  such  a  way  that  the  kind  of  system 
of  linear  equations  discussed  in  the  previous  section  can  be  written  as  a  matrix 
equation  in  the  form  AX  =  B,  where 


More  precisely  if  A  is  a  k-by-m  matrix  and  B  is  an  m-by-n  matrix,  then  the 
product  C  =  AB  is  the  k-by-n  matrix  defined  by 


m 

Cij  =  ^  AnBij. 
i=i 

The  (i,  j)lh  entry  of  C  is  therefore  the  product  of  the  /th  row  of  A  and  the  /lh 
column  of  B. 

Let  us  emphasize  that  the  condition  for  a  product  AB  to  be  defined  is  that 
the  number  of  columns  of  A  should  equal  the  number  of  rows  of  B.  With  this 
definition  the  system  of  equations  mentioned  above  is  indeed  of  the  form  AX  =  B. 

Proposition  1.28.  Matrix  multiplication  has  the  properties  that 

(a)  it  is  associative  in  the  sense  that  (AB)C  =  A(BC),  provided  that  the 
sizes  match  correctly,  i.e.,  A  is  in  M^m (F) ,  B  is  in  Mmn(F),  and  C  is  in 
Mnp( ¥), 

(b)  it  is  distributive  over  addition  in  the  sense  that  A{B  +  C)  =  AB  +  AC 
and  (B  +  C)D  =  BD  +  CD  if  the  sizes  match  correctly. 

Remark.  Matrix  multiplication  is  not  necessarily  commutative,  even  for 
square  matrices.  For  example,  (JJ)(SJ)  =  (J  j),  while  (j  j)  ( J  j)  = 
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Proof.  For  (a),  we  have 

(( AB)C)ij  =  E'/=,  {AB)itCtj  =  E”=i  E7=i  A-.v5srC0- 
and  ( A{BC))ij  =  EE,  Ais(BC)sj  =  EE,  E"=i  AisBstCtj, 
and  these  are  equal.  For  the  first  identity  in  (b),  we  have 


(A(B  +  C))ij  =  E,  Al7(fl  +  C)ij  =  E/  Af/(fiy  +  Cy) 


=  E,  Al7fly  +  E/  A„Cy  =  (AB)y  +  (AC)y, 


and  the  second  identity  is  proved  similarly. 


□ 


We  have  already  defined  the  zero  matrix  0  of  a  given  size  to  be  the  matrix 
having  0  in  each  entry.  This  matrix  has  the  property  that  0  A  =  0  and  50  =  0  if  the 
sizes  match  properly.  The  n- by-/?  identity  matrix,  denoted  by  7  or  sometimes  1, 
is  defined  to  be  the  matrix  with  =  <5y ,  where  8jj  is  the  Kronecker  delta 
defined  by 


8 


ij 


i  if;  =  j, 

0  if;  /  j. 


In  other  words,  the  identity  matrix  is  the  square  matrix  of  the  form 


/I  00 

...ox 

0  1  0 

...  0 

7  = 

0  0  1 

...  0 

Vo  0  0 

''J 

It  has  the  property  that  I  A  =  A  and  B I  =  I  whenever  the  sizes  match  properly. 

Let  A  be  an  n-by-n  matrix.  We  say  that  A  is  invertible  and  has  the  n-by-n 
matrix  B  as  inverse  if  AB  =  BA  =  I .  If  B  and  C  are  n-by-n  matrices  with 
AB  =  I  and  C  A  =  7,  then  associativity  of  multiplication  (Proposition  1.28a) 
implies  that  B  =  / B  =  ( CA)B  =  C(AB)  =  Cl  =  C .  Hence  an  inverse  for  A 
is  unique  if  it  exists.  We  write  A-1  for  this  inverse  if  it  exists.  Inverses  of  n-by-n 
matrices  have  the  property  that  if  A  and  D  are  invertible,  then  AD  is  invertible 
and  (AD)-1  =  D~{  A-1 ;  moreover,  if  A  is  invertible,  then  A-1  is  invertible  and 
its  inverse  is  A. 

The  method  of  row  reduction  in  the  previous  section  suggests  a  way  of  com¬ 
puting  the  inverse  of  a  matrix.  Suppose  that  A  is  a  square  matrix  to  be  inverted 
and  we  are  seeking  its  inverse  B.  Then  AB  =  7.  Examining  the  definition  of 
matrix  multiplication,  we  see  that  this  matrix  equation  means  that  the  product  of 
A  and  the  first  column  of  B  equals  the  first  column  of  /,  the  product  of  A  and  the 
second  column  of  B  equals  the  second  column  of  7,  and  so  on.  We  can  thus  think 
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of  a  column  of  B  as  the  unknowns  in  a  system  of  linear  equations,  the  known 
right  sides  being  the  entries  of  the  column  of  the  identity  matrix.  As  the  column 
index  varies,  the  left  sides  of  these  equations  do  not  change,  since  they  are  always 

given  by  A.  So  we  can  attempt  to  solve  all  of  the  systems  (one  for  each  column) 

/I  2  3  \ 

simultaneously.  For  example,  to  attempt  to  invert  A  =  I  4  5  6  1,  we  set  up 

V  7  8  10/ 


( 1 

2 

3 

1 

0 

°\ 

4 

5 

6 

0 

1 

° 

V  7 

8 

10 

0 

0 

1/ 

Imagine  doing  the  row  reduction.  We  can  hope  that  the  result  will  be  of  the  form 


(  1 

0 

0 

- 

- 

° 

1 

0 

- 

-  - 

v° 

0 

1 

- 

-  - 

with  the  identity  matrix  on  the  left  side  of  the  vertical  line.  If  this  is  indeed  the 
result,  then  the  computation  shows  that  the  matrix  on  the  right  side  of  the  vertical 
line  is  the  only  possibility  for  A~l .  But  does  A-1  in  fact  exist? 

Actually,  another  question  arises  as  well.  According  to  Proposition  1.27c,  the 
other  possibility  in  applying  row  reduction  is  that  the  left  side  has  a  row  of  0's. 
In  this  case,  can  we  deduce  that  A-1  does  not  exist?  Or,  to  put  it  another  way, 
can  we  be  sure  that  some  row  of  the  reduced  row-echelon  form  has  all  0’s  on  the 
left  side  of  the  vertical  line  and  something  nonzero  on  the  right  side? 

All  of  the  answers  to  these  questions  are  yes,  and  we  prove  them  in  a  mo¬ 
ment.  First  we  need  to  see  that  elementary  row  operations  are  given  by  matrix 
multiplications. 


Proposition  1.29.  Each  elementary  row  operation  is  given  by  left  multiplica¬ 
tion  by  an  invertible  matrix.  The  inverse  matrix  is  the  matrix  of  another  elementary 
row  operation. 

Remark.  The  square  matrices  giving  these  left  multiplications  are  called 

elementary  matrices. 


PROOF.  For  the  interchange  of  rows 
in  the  rows  and  columns  with  i  or  j  as 

i 


i  and  j,  the  part  of  the  elementary  matrix 
index  is 

j 


and  otherwise  the  matrix  is  the  identity.  This  matrix  is  its  own  inverse. 
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For  the  multiplication  of  the  i th  row  by  a  nonzero  scalar  c,  the  matrix  is  diagonal 
with  c  in  the  1th  diagonal  entry  and  with  1  in  all  other  diagonal  entries.  The  inverse 
matrix  is  of  this  form  with  c-1  in  place  of  c. 

For  the  replacement  of  the  ;th  row  by  the  sum  of  the  /th  row  and  the  product 
of  a  times  the  /th  row,  the  part  of  the  elementary  matrix  in  the  rows  and  columns 
with  i  or  j  as  index  is 

i  j 

i  f  1  a 

j  \0  1 

and  otherwise  the  matrix  is  the  identity.  The  inverse  of  this  matrix  is  the  same 
except  that  a  is  replaced  by  —a.  □ 

Theorem  1.30.  The  following  conditions  on  an  n-by-n  square  matrix  A  are 
equivalent: 

(a)  the  reduced  row-echelon  form  of  A  is  the  identity, 

(b)  A  is  the  product  of  elementary  matrices, 

(c)  A  has  an  inverse, 


(d)  the  system  of  equations  AX  =  0  with  X  = 


has  only  the  solution 


X  =  0. 

PROOF.  If  (a)  holds,  choose  a  sequence  of  elementary  row  operations  that 
reduce  A  to  the  identity,  and  let  E\, . . . ,  Er  be  the  corresponding  elementary 
matrices  given  by  Proposition  1.29.  Then  we  have  Er  •••  E\ A  =  /,  and  hence 


A  =  E j-1  •  •  •  E~l.  The  proposition  says  that  each  Ej  1  is  an  elementary  matrix, 
and  thus  (b)  holds. 

If  (b)  holds,  then  (c)  holds  because  the  elementary  matrices  are  invertible  and 
the  product  of  invertible  matrices  is  invertible. 

If  (c)  holds  and  if  AX  =  0,  then  X  =  IX  =  {A~l  A)X  =  A~l(AX)  = 
A_10  =  0.  Hence  (d)  holds. 

If  (d)  holds,  then  the  number  of  independent  variables  in  the  row  reduction  of 
A  is  0.  Proposition  1.26a  shows  that  the  number  of  comer  variables  is  «,  and 
parts  (b)  and  (c)  of  Proposition  1.27  show  that  the  reduced  row-echelon  form  of 
A  is  I.  Thus  (a)  holds.  □ 

Corollary  1.31.  If  the  solution  procedure  for  finding  the  inverse  of  a  square 
matrix  A  leads  from  (A  |  I)  to  (/  |  X),  then  A  is  invertible  and  its  inverse  is  X. 
Conversely  if  the  solution  procedure  leads  to  (R  \  Y)  and  R  has  a  row  of  0’s,  then 
A  is  not  invertible. 

Remark.  Proposition  1.27c  shows  that  this  corollary  addresses  the  only 
possible  outcomes  of  the  solution  procedure. 
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PROOF.  We  apply  the  equivalence  of  (a)  and  (c)  in  Theorem  1.30  to  settle  the 
existence  or  nonexistence  of  A-1.  In  the  case  that  A-1  exists,  we  know  that  the 
solution  procedure  has  to  yield  the  inverse.  □ 

Corollary  1.32.  Let  A  be  a  square  matrix.  If  B  is  a  square  matrix  such  that 
B  A  =  I ,  then  A  is  invertible  and  B  is  its  inverse.  If  C  is  a  square  matrix  such 
that  AC  =  /,  then  A  is  invertible  with  inverse  C. 

PROOF.  Suppose  BA  =  1.  Let  X  be  a  column  vector  with  AX  =  0.  Then 
X  =  IX  =  ( BA)X  =  B(AX)  =  BO  =  0.  Since  (d)  implies  (c)  in  Theorem 
1.30,  A  is  invertible. 

Suppose  AC  =  1 .  Applying  the  result  of  the  previous  paragraph  to  C,  we 
conclude  that  C  is  invertible  with  inverse  A.  Therefore  A  is  invertible  with 
inverse  C.  □ 


7.  Problems 

1.  What  is  the  greatest  common  divisor  of  9894  and  1 1058? 

2.  (a)  Find  integers  x  and  y  such  that  1  lx  +  7  v  =  I. 

(b)  How  are  all  pairs  (x,  y)  of  integers  satisfying  llx  +  7y  =  1  related  to  the 
pair  you  found  in  (a)? 

3.  Let  {a„}n>i  be  a  sequence  of  positive  integers,  and  let  d  be  the  largest  integer 
dividing  all  a„ .  Prove  that  d  is  the  greatest  common  divisor  of  finitely  many  of 
the  an . 

4.  Determine  the  integers  n  for  which  there  exist  integers  x  and  y  such  that  n  divides 
x  +  y  —  2  and  2x  —  3y  —  3. 

5.  Let  P(X)  and  Q(X)  be  the  polynomials  P(X)  =  X4  +  X3  +  2X2  +  X  +  1  and 
Q{X)  =  X5  +  2X3  +  X  in  R[X], 

(a)  Find  a  greatest  common  divisor  D(X)  of  P(X)  and  Q(X). 

(b)  Find  polynomials  A  and  B  such  that  AP  +  BQ  =  D. 

6.  Let  P(X)  and  Q(X)  be  polynomials  in  R[X],  Prove  that  if  D(X)  is  a  greatest 
common  divisor  of  P(X)  and  Q(X)  in  C[X],  then  there  exists  a  nonzero  complex 
number  c  such  that  cD(X)  is  in  K[X], 

7.  (a)  Let  P(X)  be  in  R[X],  and  regard  it  as  in  C[X],  Applying  the  Fundamental 

Theorem  of  Algebra  and  its  corollary  to  P,  prove  that  if  Zj  is  a  root  of  P, 
then  so  is  z,j ,  and  Zj  and  z,j  have  the  same  multiplicity. 

(b)  Deduce  that  any  prime  polynomial  in  M[X]  has  degree  at  most  2. 
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8.  (a)  Suppose  that  a  polynomial  A(  X )  of  degree  >  0  in  Q[X\  has  integer  coef¬ 

ficients  and  leading  coefficient  1.  Show  that  if  p/q  is  a  root  of  A{X )  with 
p  and  q  integers  such  that  GCD(  p,  q)  =  1,  then  p/q  is  an  integer  n  and  n 
divides  the  constant  term  of  A(X). 

(b)  Deduce  that  X2  —  2  and  X3  +  X2  +  1  are  prime  in  Q[X], 

9.  Reduce  the  fraction  8645 / 10465  to  lowest  terms. 

10.  How  many  different  patterns  are  there  of  disjoint  cycle  structures  for  permutations 
of  {1,  2,  3,  4}?  Give  examples  of  each,  telling  how  many  permutations  there  are 
of  each  kind  and  what  the  signs  are  of  each. 


1 1 .  Prove  for  n  >  2  that  the  number  of  permutations  of  { 1 ,...,«}  with  sign  —  1 
equals  the  number  with  sign  + 1 . 


12.  Find  all  solutions  X  of  the  system  AX  —  B  when  A  = 


(a)e=Q, 


(b  )B  = 


/123\ 

I  4  5  6  1  and  B  is  given 
V  7  8  9  / 


13.  Suppose  that  a  single  step  in  the  row  reduction  process  means  a  single  arithmetic 
operation  or  a  single  interchange  of  two  entries.  Prove  that  there  exists  a  constant 
C  such  that  any  square  matrix  can  be  transformed  into  reduced  row-echelon  form 
in  <  Cn 3  steps,  the  matrix  being  of  size  n-by-n. 

14.  Compute  A  +  B  and  AB  if  A  —  ^  23  j  and  B  —  ^  3  j . 

15.  Prove  that  if  A  and  B  are  square  matrices  with  AB  =  BA,  then  (A  +  B)n  is 
given  by  the  Binomial  Theorem:  (A  +  B)n  —  Y^k=0  (^)An~kBk,  where  (j*)  is 
the  binomial  coefficient  n\/((n  —  k) l Id). 

/ 1  l  o\ 

16.  Find  a  formula  for  the  nlh  power  of  (oil  J ,  n  being  a  positive  integer. 

Vo  o  l  / 


17.  Let  D  be  an  n-by-n  diagonal  matrix  with  diagonal  entries  d\, . . . ,  d„,  and  let  A 
be  an  n-by-n  matrix.  Compute  AD  and  DA,  and  give  a  condition  for  the  equality 
AD  —  DA  to  hold. 


18.  Fix  n,  and  let  Ejj  denote  the  n-by-n  matrix  that  is  1  in  the  (i,  j)lh  entry  and 
is  0  elsewhere.  Compute  the  product  EkiEpq,  expressing  the  result  in  terms  of 
matrices  Ejj  and  instances  of  the  Kronecker  delta. 

0,then  ^ 

^  has  the  unique  solution  ^  j  =  (ad  —  bc)~] 


19.  Verify  that  if  ad— be  ^ 


tem 


C!)C)-( 


^  =(ad—bc)  1  ^  _c[  *  j  and  that  the  sys- 
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20.  Which  of  the  following  matrices  A  is  invertible?  For  the  invertible  ones,  find 
A"1. 

/  1  2  3\  / 1  2  3  \  /741\ 

(a)  A  =  (  4  5  6  ) ,  (b)A=  4  5  6  ,  (c)  A  =  (  6  4  1  ) . 

\7  8  9/  V  7  8  10  /  V  4  3  1  / 

21.  Can  a  square  matrix  with  a  row  of  0’s  be  invertible?  Why  or  why  not? 

22.  Prove  that  if  the  product  AB  of  two  n-by-n  matrices  is  invertible,  then  A  and  B 
are  invertible. 

23.  Let  A  be  a  square  matrix  such  that  Ak  =  0  for  some  positive  integer  n.  Prove 
that  I  +  A  is  invertible. 

24.  Give  an  example  of  a  set  S  and  functions  f  :  S  —*■  S  and  g  :  S  — »•  S  such  that 
the  composition  g  o  f  is  the  identity  function  but  neither  /  nor  g  has  an  inverse 
function. 

25.  Give  an  example  of  two  matrices,  A  of  size  l-by-2  and  B  of  size  2-by-l,  such 
that  AB  =  1,1  being  the  1  -by- 1  identity  matrix.  Verify  that  B  A  is  not  the  2-by-2 
identity  matrix.  Give  a  proof  for  these  sizes  that  BA  can  never  be  the  identity 
matrix. 

Problems  26-29  concern  least  common  multiples.  Let  a  and  b  be  positive  integers. 
A  common  multiple  of  a  and  b  is  an  integer  N  such  that  a  and  b  both  divide  N .  The 
least  common  multiple  of  a  and  b  is  the  smallest  positive  common  multiple  of  a  and 
b.  It  is  denoted  by  LCM(fl,  b). 

26.  Prove  that  a  and  b  have  a  least  common  multiple. 

27.  If  a  has  a  prime  factorization  given  by  a  =  p\'  ■  ■  ■  p\r ,  prove  that  any  positive 
multiple  M  of  a  has  a  prime  factorization  given  by  a  =  p™ 1  •  •  •  pT'q"'  ■  ■  ■  q"s , 
where  q\,  . . . ,  qs  are  primes  not  in  the  list  pi, ... ,  pr,  where  rrij  >  kj  for  all  j, 
and  where  n  j  >0  for  all  j. 

28.  (a)  Prove  that  if  a  =  p\'  ■  ■  ■  p^r  and  b  =  p['  ■  ■  ■  plr  are  expansions  of  a  and  b 

as  products  of  powers  of  r  distinct  primes  p\, . . . ,  pr,  then  LCM(a,  b)  = 

max(*i,/i)  ma x(kr,lr) 

Pi  '  '  '  Pr 

(b)  Prove  that  if  N  is  any  common  multiple  of  a  and  b,  then  LCM(a,  /;)  divides 
N. 

(c)  Deduce  that  ab  =  GCD(«,  b)  LCM(a,  /;). 

29.  If  a\ ,  . . . ,  at  are  positive  integers,  define  their  least  common  multiple  to  be  the 
smallest  positive  integer  M  such  that  each  aj  divides  M.  Give  a  formula  for  this 
M  in  terms  of  expansions  of  a\ , . . . ,  a,  as  products  of  powers  of  distinct  primes. 
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Vector  Spaces  over  Q,  R,  and  C 


Abstract.  This  chapter  introduces  vector  spaces  and  linear  maps  between  them,  and  it  goes  on 
to  develop  certain  constructions  of  new  vector  spaces  out  of  old,  as  well  as  various  properties  of 
determinants. 

Sections  1-2  define  vector  spaces,  spanning,  linear  independence,  bases,  and  dimension.  The 
sections  make  use  of  row  reduction  to  establish  dimension  formulas  for  certain  vector  spaces 
associated  with  matrices.  They  conclude  by  stressing  methods  of  calculation  that  have  quietly 
been  developed  in  proofs. 

Section  3  relates  matrices  and  linear  maps  to  each  other,  first  in  the  case  that  the  linear  map  carries 
column  vectors  to  column  vectors  and  then  in  the  general  finite-dimensional  case.  Techniques  are 
developed  for  working  with  the  matrix  of  a  linear  map  relative  to  specified  bases  and  for  changing 
bases.  The  section  concludes  with  a  discussion  of  isomorphisms  of  vector  spaces. 

Sections  4-6  take  up  constructions  of  new  vector  spaces  out  of  old  ones,  together  with  corre¬ 
sponding  constructions  for  linear  maps.  The  four  constructions  of  vector  spaces  in  these  sections 
are  those  of  the  dual  of  a  vector  space,  the  quotient  of  two  vector  spaces,  and  the  direct  sum  and 
direct  product  of  two  or  more  vector  spaces. 

Section  7  introduces  determinants  of  square  matrices,  together  with  their  calculation  and  prop¬ 
erties.  Some  of  the  results  that  are  established  are  expansion  in  cofactors,  Cramer's  rule,  and  the 
value  of  the  determinant  of  a  Vandermonde  matrix.  It  is  shown  that  the  determinant  function  is  well 
defined  on  any  linear  map  from  a  finite-dimensional  vector  space  to  itself. 

Section  8  introduces  eigenvectors  and  eigenvalues  for  matrices,  along  with  their  computation. 
Also,  in  this  section  the  characteristic  polynomial  and  the  trace  of  a  square  matrix  are  defined,  and 
all  these  notions  are  reinterpreted  in  terms  of  linear  maps. 

Section  9  proves  the  existence  of  bases  for  infinite-dimensional  vector  spaces  and  discusses  the 
extent  to  which  the  material  of  the  first  eight  sections  extends  from  the  finite-dimensional  case  to  be 
valid  in  the  infinite-dimensional  case. 


1.  Spanning,  Linear  Independence,  and  Bases 

This  chapter  develops  a  theory  of  rational,  real,  and  complex  vector  spaces.  Many 
readers  will  already  be  familiar  with  some  aspects  of  this  theory,  particularly  in 
the  case  of  the  vector  spaces  Q",  W1,  and  C”  of  column  vectors,  where  the  tools 
developed  from  row  reduction  allow  one  to  introduce  geometric  notions  and  to 
view  geometrically  the  set  of  solutions  to  a  set  of  linear  equations.  Thus  we  shall 
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be  brief  about  many  of  these  matters,  concentrating  on  the  algebraic  aspects  of 
the  theory.  Let  IF  denote  any  of  Q,  M,  or  C.  Members  of  F  are  called  scalars.1 

A  vector  space  over  Fisa  set  V  with  two  operations,  addition  carrying  V  x  V 
into  V  and  scalar  multiplication  carrying  F  x  V  into  V,  with  the  following 
properties: 

(i)  the  operation  of  addition,  written  +,  satisfies 

(a)  Vi  +  (t>2  +  V3)  =  (t>i  +  V2)  +  V3  for  all  v\,  V2,  U3  in  V  (associative  law), 

(b)  there  exists  an  element  0  in  V  with  v  +  0  =  0  +  v  =  v  for  all  v  in  V, 

(c)  to  each  v  in  V  corresponds  an  element  —  v  in  V  such  that  v  +  (— v)  = 
(—v)  +  v  =  0, 

(d)  vi  +  t>2  =  i>2  +  v\  for  all  t>i  and  ih  in  V  (commutative  law); 

(ii)  the  operation  of  scalar  multiplication,  written  without  a  sign,  satisfies 

(a)  a  (fin)  =  (a b)v  for  all  v  in  V  and  all  scalars  a  and  fi, 

(b)  1  v  =  v  for  all  v  in  V  and  for  the  scalar  1 ; 

(iii)  the  two  operations  are  related  by  the  distributive  laws 

(a)  a{  t>i  +  ih)  =  o  v  1  +  a  ih  for  all  v\  and  m  in  V  and  for  all  scalars  a, 

(b)  ( a  +  b)v  =  av  +  bv  for  all  v  in  V  and  all  scalars  a  and  fi. 

It  is  immediate  from  these  properties  that 

•  0  is  unique  (since  0'  =  O'  +  0  =  0), 

•  —  v  is  unique  (since  (—  v)'  =  (—v)'  +  0  =  (—v)'  +  (v  +  (— v))  = 
((-V)'  +  v)  +  (-v)  =  0  +  (-V)  =  (-V)), 

•  On  =  0  (since  On  =  (0  +  0)n  =  On  +  On), 

•  (— l)n  =  — n  (sinceO  =  On  =  (l  +  (— l))n  =  ln+(— l)n  =  n+(— l)n), 

•  a0  =  0  (since  aO  =  a(0  +  0)  =  aO  +  aO). 

Members  of  V  are  called  vectors. 

Examples. 

(1)  V  =  Mkn( F),  the  space  of  all  fi-by-n  matrices.  The  above  properties  of  a 
vector  space  over  F  were  already  observed  in  Section  1.6.  The  vector  space  F*  of 
all  A  -dimensional  column  vectors  is  the  special  case  n  =  1 ,  and  the  vector  space 
F  of  scalars  is  the  special  case  k  =  n  =  1 . 

(2)  Let  S  be  any  nonempty  set,  and  let  V  be  the  set  of  all  functions  from  S  into 
F.  Define  operations  by  (/  +  g)(s)  =  f(s )  +  g(s)  and  ( cf)(s )  =  c(/(s)).  The 
operations  on  the  right  sides  of  these  equations  are  those  in  F,  and  the  properties 
of  a  vector  space  follow  from  the  fact  that  they  hold  in  F  at  each  5. 

1  All  the  material  of  this  chapter  will  ultimately  be  seen  to  work  when  F  is  replaced  by  any  “field." 
This  point  will  not  be  important  for  us  at  this  stage,  and  we  postpone  considering  it  further  until 
Chapter  IV. 
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(3)  More  generally  than  in  Example  2,  let  S  be  any  nonempty  set,  let  U  be  a 
vector  space  over  F,  and  let  V  be  the  set  of  all  functions  from  S  into  U .  Define 
the  operations  as  in  Example  2,  but  interpret  the  operations  on  the  right  sides  of 
the  defining  equations  as  those  in  U .  Then  the  properties  of  a  vector  space  follow 
from  the  fact  that  they  hold  in  U  at  each  s. 

(4)  Let  V  be  any  vector  space  over  C,  and  restrict  scalar  multiplication  to  an 
operation  IxV->  V.  Then  V  becomes  a  vector  space  over  R.  In  particular,  C 
is  a  vector  space  over  R. 

(5)  Let  V  =  F[X]  be  the  set  of  all  polynomials  in  one  indeterminate  with 
coefficients  in  F,  and  define  addition  and  scalar  multiplication  as  in  Section  1.3. 
Then  V  is  a  vector  space. 

(6)  Let  V  be  any  vector  space  over  F,  and  let  U  be  any  nonempty  subset  closed 
under  addition  and  scalar  multiplication.  Then  U  is  a  vector  space  over  F.  Such  a 
subset  U  is  called  a  vector  subspace  of  V ;  sometimes  one  says  simply  subspace 
if  the  context  is  unambiguous.2 

(7)  Let  V  be  any  vector  space  over  F,  and  let  U  =  { va  1  be  any  subset  of 
V.  A  finite  linear  combination  of  the  members  of  U  is  any  vector  of  the  form 

u„t  +  •  •  •  +  cUn  van  with  each  ca.  in  F,  each  vaj  in  U ,  and  n  >  0.  The  linear 
span  of  U  is  the  set  of  all  finite  linear  combinations  of  members  of  U.  It  is  a 
vector  subspace  of  V  and  is  denoted  by  span { va } •  By  convention,  span  0  =  0. 

(8)  Many  vector  subspaces  arise  in  the  context  of  some  branch  of  mathematics 
after  some  additional  structure  is  imposed.  For  example  let  V  be  the  vector 
space  of  all  functions  from  R3  into  R,  an  instance  of  Example  2.  The  subset 
U  of  continuous  members  of  V  is  a  vector  subspace;  the  closure  under  addition 
and  scalar  multiplication  comes  down  to  knowing  that  addition  is  a  continuous 
function  from  R3  x  R3  into  R3  and  that  scalar  multiplication  from  R  x  R3 
into  R3  is  continuous  as  well.  Another  example  is  the  subset  of  twice  continu¬ 
ously  differentiable  members  /  of  V  satisfying  the  partial  differential  equation 


The  associative  and  commutative  laws  in  the  definition  of  “vector  space”  imply 
certain  more  complicated  formulas  of  which  the  stated  laws  are  special  cases. 
With  associativity  of  addition,  if  n  vectors  v\, ...  ,vn  are  given,  then  any  way  of 
inserting  parentheses  into  the  expression  iq  +  V2+  ■  ■  ■  +  v „  leads  to  the  same  result, 
and  a  similar  conclusion  applies  to  the  associativity-like  formula  a(bv)  =  ( ab)v 
for  scalar  multiplication.  In  the  presence  of  associativity,  the  commutative  law 
for  addition  implies  that  iq  +  V2  +  •  — h  vn  =  ua(i)  +  va(2)  +  •  •  •  +  va(n)  for  any 

2The  word  “subspace’"  arises  also  in  the  context  of  metric  spaces  and  more  general  topological 
spaces,  and  the  metric-topological  notion  of  subspace  is  distinct  from  the  vector  notion  of  subspace. 
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permutation  of  { 1 ,  ...,«}.  All  these  facts  are  proved  by  inductive  arguments,  and 
the  details  are  addressed  in  Problems  2-3  at  the  end  of  the  chapter. 

Let  V  be  a  vector  space  over  IF.  A  subset  j  v(i }  of  V  spans  V  or  is  a  spanning 
set  for  V  if  the  linear  span  of  in  the  sense  of  Example  7  above,  is  all  of  V . 
A  subset  { vu }  is  linearly  independent  if  whenever  a  finite  linear  combination 
caivai  +  •  •  •  +  canv a„  equals  the  0  vector,  then  all  the  coefficients  must  be  0: 
cai  =  ■  ■  ■  =  c„n  =  0.  By  subtraction  we  see  that  in  this  case  any  equality  of  two 
finite  linear  combinations 

Co1!  Ay  T  '  '  '  T  Cofn  Van  =  da  l  Va  T  •  •  •  T  daH  Van 

implies  that  the  respective  coefficients  are  equal:  ca.  =  da.  for  1  <  j  <  n. 

A  subset  {i>„}  is  a  basis  if  it  spans  V  and  is  linearly  independent.  In  this  case 
each  member  of  V  has  one  and  only  one  expansion  as  a  finite  linear  combination 
of  the  members  of  { va } . 


Example.  In  IF" ,  the  vectors 


fit 

(?) 

O  O 

o  o 

e\  = 

0 

,  e2  = 

0 

,  e-i  = 

l 

,  •  •  •  5  — 

0 

u 

u 

u 

w 

form  a  basis  of  IF"  called  the  standard  basis  of  F". 


Proposition  2.1.  Let  V  be  a  vector  space  over  F. 

(a)  If  {i>„}  is  a  linearly  independent  subset  of  V  that  is  maximal  with  respect  to 
the  property  of  being  linearly  independent  (i.e.,  has  the  property  of  being  strictly 
contained  in  no  linearly  independent  set),  then  {va}  is  a  basis  of  V . 

(b)  If  {ua}  is  a  spanning  set  for  V  that  is  minimal  with  respect  to  the  property 
of  spanning  (i.e.,  has  the  property  of  strictly  containing  no  spanning  set),  then 
{w„}  is  a  basis  of  V. 

PROOF.  For  (a),  let  v  be  given.  We  are  to  show  that  v  is  in  the  span  of  { v(y } . 
Without  loss  of  generality,  we  may  assume  that  v  is  not  in  the  set  j  vu }  itself. 
By  the  assumed  maximality,  {vff}  U  j  v }  is  not  linearly  independent,  and  hence 
c v  +  c„t  V(y  +  •  •  •  +  cUn  va„  =  0  for  some  scalars  c,cai, ,  c„n  not  all  0.  Here 
c  i=-  0  since  (u„}  is  linearly  independent.  Then  v  =  —c~ 1  cUl  u„l  —  •  •  •  —  1  cUn  va/i , 
and  v  is  exhibited  as  in  the  linear  span  of  { va } . 

For  (b),  suppose  that  ca,1  vu  ,+•••+  can  vUii  =  0  with  cai, ,  can  not  all  0.  Say 
cfy  /  0.  Then  we  can  solve  for  Da,  and  see  that  vai  is  a  finite  linear  combination  of 
va2 , . . . ,  vUn  ■  Substitution  shows  that  any  finite  linear  combination  of  the  va  ’s  is  a 
finite  linear  combination  of  the  va ’s  other  than  va  ,  and  we  obtain  a  contradiction 
to  the  assumed  minimality  of  the  spanning  set.  □ 
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Proposition  2.2.  Let  V  be  a  vector  space  over  F.  If  V  has  a  finite  spanning 
set  {iq , . . . ,  vm } .  then  any  linearly  independent  set  in  V  has  <  m  elements. 

PROOF.  It  is  enough  to  show  that  no  subset  of  m  +  1  vectors  can  be  linearly 
independent.  Arguing  by  contradiction,  suppose  that  {u\,  is  a  linearly 

independent  set  with  n  =  m  +  1 .  Write 

Ml  =  Cn  V]  +  C21W2  +  '  '  '  +  Cm\Vm, 


Un  —  C\nV\  T  C2n^2  +  •  •  •  +  CmnVm. 
The  system  of  linear  equations 

C11X1  +  •  •  •  +  c\nxn  =  0, 


Cm  lXl  +  •  •  •  +  CmnXn  —  0, 

is  a  homogeneous  system  of  linear  equations  with  more  unknowns  than  equations, 
and  Proposition  1.26d  shows  that  it  has  a  nonzero  solution  (x\ , . . . ,  xn).  Then 
we  have 

x\ii\  -\ - b  xnun  =ci1.riu1  +  c2iXi u2  H - +  c,„\X\Vm 


+ 

+ 

+ 

+ 

+ 

+ 

n%n 

V\  +  C2nXnV2  +  •  ' 

'  '  +  CmnXn 

=  0, 

in  contradiction  to  the  assumed  linear  independence  of  {u\, . . . ,  u„}.  □ 

Corollary  2.3.  If  the  vector  space  V  has  a  finite  spanning  set  {iq, . . . ,  v,„ } , 
then 

(a)  {ui, . . . ,  vm}  has  a  subset  that  is  a  basis, 

(b)  any  linearly  independent  set  in  V  can  be  extended  to  a  basis, 

(c)  V  has  a  basis, 

(d)  any  two  bases  have  the  same  finite  number  of  elements,  necessarily  <  m. 

Remarks.  In  this  case  we  say  that  V  is  finite-dimensional,  and  the  number 
of  elements  in  a  basis  is  called  the  dimension  of  V,  written  dim  V.  If  V  has  no 
finite  spanning  set,  we  say  that  V  is  infinite-dimensional.  A  suitable  analog  of 
the  conclusion  in  Corollary  2.3  is  valid  in  the  infinite-dimensional  case,  but  the 
proof  is  more  complicated.  We  take  up  the  infinite-dimensional  case  in  Section  9. 
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PROOF.  By  discarding  elements  of  the  set  {i>i, . . . ,  vm }  one  at  a  time  if  nec¬ 
essary  and  by  applying  Proposition  2.1b,  we  obtain  (a).  For  (b),  we  see  from 
Proposition  2.2  that  the  given  linearly  independent  set  has  <  m  elements.  If  we 
adjoin  elements  to  it  one  at  a  time  so  as  to  obtain  larger  linearly  independent  sets. 
Proposition  2.2  shows  that  there  must  be  a  stage  at  which  we  can  proceed  no 
further  without  violating  linear  independence.  Proposition  2.  la  then  says  that  we 
have  a  basis.  For  (c),  we  observe  that  (a)  has  already  produced  a  basis.  Any  two 
bases  have  the  same  number  of  elements,  by  two  applications  of  Proposition  2.2, 
and  this  proves  (d).  □ 

Examples.  The  vector  space  M*„( F)  of  k-by-n  matrices  has  dimension  kn. 
The  vector  space  of  all  polynomials  in  one  indeterminate  is  infinite-dimensional 
because  the  subspace  consisting  of  0  and  of  all  polynomials  of  degree  <  n  has 
dimension  n  +  1 . 

Corollary  2.4.  If  V  is  a  finite-dimensional  vector  space  with  dim  V  =  n,  then 
any  spanning  set  of  n  elements  is  a  basis  of  V,  and  any  linearly  independent  set 
of  n  elements  is  a  basis  of  V.  Consequently  any  n -dimensional  vector  subspace 
U  of  V  coincides  with  V. 

PROOF.  These  conclusions  are  immediate  from  parts  (a)  and  (b)  of  Corollary 
2.3  if  we  take  part  (d)  into  account.  □ 

Corollary  2.5.  If  V  is  a  finite-dimensional  vector  space  and  U  is  a  vector 
subspace  of  V,  then  U  is  finite-dimensional,  and  dim  U  <  dim  V . 

PROOF.  Let  {iq , . . . ,  vm }  be  a  basis  of  V.  According  to  Proposition  2.2,  any 
linearly  independent  set  in  U  has  <  m  elements,  being  linearly  independent  in 
V.  We  can  thus  choose  a  maximal  linearly  independent  subset  of  JJ  with  <  m 
elements,  and  Proposition  2.1a  shows  that  the  result  is  a  basis  of  U.  □ 


2.  Vector  Spaces  Defined  by  Matrices 

Let  A  be  a  member  of  A/*„( F),  thus  a  k-by-n  matrix.  The  row  space  of  A  is  the 
linear  span  of  the  rows  of  A,  regarded  as  a  vector  subspace  of  the  vector  space  of 
all  n -dimensional  row  vectors.  The  column  space  of  A  is  the  linear  span  of  the 
columns,  regarded  as  a  vector  subspace  of  ^-dimensional  column  vectors.  The 
null  space  of  A  is  the  vector  subspace  of  n -dimensional  column  vectors  v  for 
which  Av  =  0,  where  Av  is  the  matrix  product.  The  fact  that  this  last  space 
is  a  vector  subspace  follows  from  the  properties  A(i>i  +  =  Av \  +  Av 2  and 

A{cv)  =  c{Av )  of  matrix  multiplication. 
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We  can  use  matrix  multiplication  to  view  the  matrix  A  as  defining  a  function 
v  Av  of  F"  to  ¥k.  This  function  satisfies  the  properties  just  listed, 

A(v  i  +  V2 )  =  Av  i  +  Av2  and  A(cv)  =  c(Av ), 

and  we  shall  consider  further  functions  with  these  two  properties  starting  in  the 
next  section.  In  terms  of  this  function,  the  null  space  of  A  is  the  set  in  the  domain 
IF"  mapped  to  0.  Because  of  these  same  properties  and  because  the  product  Ac, 
of  A  and  the  /th  standard  basis  vector  ej  in  F"  is  the  /lh  column  of  A,  the  column 
space  of  A  is  the  image  of  the  function  v  i->  Ai>  as  a  subset  of  the  range  FC 

Theorem  2.6.  If  A  is  in  Mf<n  (F),  then 
dimfcolumn  space(A))  +  dimfnull  space(A))  =  #(columns  of  A)  =  n. 

PROOF.  Corollary  2.5  says  that  the  null  space  is  finite-dimensional,  being  a 
vector  subspace  of  F",  and  Corollary  2.3c  shows  that  the  null  space  has  a  basis, 
say  {ui, . . . ,  vr } .  By  Corollary  2.3b  we  can  adjoin  vectors  vr+\ , ....  v„  so  that 
{i>i, . . . ,  vn}  is  a  basis  of  ¥n.  If  v  is  in  F",  we  can  expand  v  in  terms  of  this  basis 
as  v  =  C|  ?)|  +  •  •  •  +  cnvn.  Application  of  A  gives 


Ai>  —  A(c  i  Ui  +  •  •  •  +  cnvn)  —  ci  Aui  +  •  •  •  +  c,Avr  +  cr+ i  Aiy+i  +  •  •  •  +  cn  Avn 

—  Cr+i  Avr+i  T  *  *  *  T  cnAvn. 


Therefore  the  vectors  Avr+ 1 ,  . . . ,  Avn  span  the  column  space. 

Let  us  see  that  they  form  a  basis  for  the  column  space.  Thus  suppose  that 
Cr+iAiy+i  +  •  •  •  +  cnAvn  =  0.  Then  A(cr+ity+i  +  •  •  •  +  c„ v„)  =  0,  and 
cr+  iUr+i  +  •  •  •  +  c„  v„  is  in  the  null  space.  Since  {t>i, . . . ,  iy}  is  a  basis  of  the  null 
space,  we  have 


cr-|_iiy_|-i  +  •  •  •  +  cnvn  —  a\V\  +  •  •  •  +  arvr 


for  suitable  scalars  ci\ , . . .  ,ar.  Therefore 


(— fll)l>l  +  •  •  •  +  {—ar)vr  +  C,.+l  llr  +  l  +  •  •  •  +  cn u;7—0. 


Since  V\, . . . ,  vn  are  linearly  independent,  all  the  cj  are  0.  We  conclude  that 
Avr+ 1, . . . ,  Avn  are  linearly  independent  and  therefore  form  a  basis  of  the  column 
space. 

As  a  result,  we  have  established  in  the  identity  r  +  (n  —  r)  =  n  that  n  —  r 
can  be  interpreted  as  dimfcolumn  space(A))  and  that  r  can  be  interpreted  as 
dimfnull  spacef  A)).  The  theorem  follows.  □ 
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Proposition  2.7.  If  A  is  in  M^„  (F),  then  each  elementary  row  operation  on  A 
preserves  the  row  space  of  A. 

PROOF.  Let  the  rows  of  A  be  n, ... ,  r*.  Their  span  is  unchanged  if  we 
interchange  two  of  them  or  multiply  one  of  them  by  a  nonzero  scalar.  If  we 
replace  the  row  r,  by  r,  +  crj  with  j  /  i,  then  the  span  is  unchanged  since 

dji'i  +  ctji'j  =  ai(rj  +  crj)  +  (aj  —  aic)rj 

shows  that  any  finite  linear  combination  of  the  old  rows  is  a  finite  linear  combi¬ 
nation  of  the  new  rows  and  since 

b,  0'i  +  crj)  +  bjrj  =  b ,-r;  +  {btc  +  bj)rj 

shows  the  reverse.  □ 

Theorem  2.8.  If  A  in  M*„( F)  has  reduced  row-echelon  form  R,  then 

dim  (row  space(A))  =  dim  (row  space(jR)) 

=  #(nonzero  rows  of  R)  =  #(corner  variables  of  R) 


and 

d  i  m  (null  space(A))  =  dim  (null  space!  A1))  =  #(independent  variables  of  R). 

PROOF.  The  first  equality  in  the  first  conclusion  is  immediate  from  Proposition 
2.7,  and  the  last  equality  of  that  conclusion  is  known  from  the  method  of  row 
reduction.  To  see  the  middle  equality,  we  need  to  see  that  the  nonzero  rows  of  R 
are  linearly  independent.  Let  these  rows  be  ri, . . . ,  rt.  For  each  i  with  1  <  i  <  t, 
the  index  of  the  first  nonzero  entry  of  r,  was  denoted  by  j(i)  in  Section  1.5.  That 
entry  has  to  be  1 ,  and  the  other  rows  have  to  be  0  in  that  entry,  by  definition  of 
reduced  row-echelon  form.  If  a  finite  linear  combination  cp  i  +  •  •  •  +  ctrt  is  0, 
then  inspection  of  the  j  (i  )th  entry  yields  the  equality  c,  =0,  and  thus  we  conclude 
that  all  the  coefficients  are  0.  This  proves  the  desired  linear  independence. 

The  first  equality  in  the  second  conclusion  is  by  the  solution  procedure  for  ho¬ 
mogeneous  systems  of  equations  in  Section  1.5;  the  set  of  solutions  is  unchanged 
by  each  row  operation.  To  see  the  second  equality,  we  recall  that  the  form  of  the 
solution  is  as  a  finite  linear  combination  of  specific  vectors,  the  coefficients  being 
the  independent  variables.  What  the  second  equality  is  asserting  is  that  these 
vectors  form  a  basis  of  the  space  of  solutions.  We  are  thus  to  prove  that  they  are 
linearly  independent.  Let  the  independent  variables  be  certain  xj’s,  and  let  the 
corresponding  vectors  be  Vj’s.  Then  we  know  that  the  vector  vt  has  jth  entry  1 
and  that  all  the  other  vectors  have  jth  entry  0.  If  a  finite  linear  combination  of  the 
vectors  is  0,  then  examination  of  the  /lh  entry  shows  that  the  /lh  coefficient  is  0. 
The  result  follows.  □ 
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Corollary  2.9.  If  A  is  in  Mkn( F),  then 

dim(row  space!  A))  +  dimfnull  space(A))  =  #(columns  of  A)  =  n. 

PROOF.  We  add  the  two  formulas  in  Theorem  2.8  and  see  that 

dim(row  space(A))  +  dim(null  space(A)) 

equals  the  sum  #(corner  variables  of  R)  +  ^independent  variables  of  R).  Since 
all  variables  are  corner  variables  or  independent  variables,  this  sum  is  n,  and  the 
result  follows.  □ 

Corollary  2.10.  If  A  is  in  Mkn( F),  then 

dim(row  space(A))  =  dim(column  space(A)). 

Remark.  The  common  value  of  the  dimension  of  the  row  space  of  A  and  the 
dimension  of  the  column  space  of  A  is  called  the  rank  of  A.  Some  authors  use 
the  separate  terms  “row  rank”  and  “column  rank”  for  the  two  sides,  and  then  the 
result  is  that  these  integers  are  equal. 

PROOF.  This  follows  by  comparing  Theorem  2.6  and  Corollary  2.9.  □ 

Although  the  above  results  may  seem  to  have  an  abstract  sound  at  first,  methods 
of  calculation  for  all  the  objects  in  question  have  quietly  been  carried  along  in 
the  proofs,  with  everything  rooted  in  the  method  of  row  reduction.  All  the  proofs 
have  in  effect  already  been  given  that  these  methods  of  calculation  do  what  they 
are  supposed  to  do.  If  A  is  in  Mj(n  (F ) ,  the  transpose  of  A,  denoted  by  A1 ,  is  the 
member  of  F)  with  entries  (A’)lj  =  A/(-.  In  particular,  the  transpose  of  a 
row  vector  is  a  column  vector,  and  vice  versa. 

Methods  of  calculation. 

(1)  Basis  of  the  row  space  of  A.  Row  reduce  A,  and  use  the  nonzero  rows  of 
the  reduced  row-echelon  form. 

(2)  Basis  of  the  column  space  of  A.  Transpose  A,  compute  a  basis  of  the  row 
space  of  A'  by  Method  1,  and  transpose  the  resulting  row  vectors  into  column 
vectors. 

(3)  Basis  of  the  null  space  of  A.  Use  the  solution  procedure  for  Av  =  0  given 
in  Section  1.5.  The  set  of  solutions  is  given  as  all  finite  linear  combinations  of 
certain  column  vectors,  the  coefficients  being  the  independent  variables.  The 
column  vectors  that  are  obtained  form  a  basis  of  the  null  space. 

(4)  Basis  of  the  linear  span  of  the  column  vectors  v\, ...  ,vn.  Arrange  the 
columns  into  a  matrix  A.  Then  the  linear  span  is  the  column  space  of  A,  and  a 
basis  can  be  determined  by  Method  2. 
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(5)  Extension  of  a  linearly  independent  set  { i>  i , . . . ,  ty }  of  column  vectors  in 
IF"  to  a  basis  of  F" .  Arrange  the  columns  into  a  matrix,  transpose,  and  row  reduce. 
Adjoin  additional  row  vectors,  one  for  each  independent  variable,  as  follows:  if 
Xj  is  an  independent  variable,  then  the  row  vector  corresponding  to  xj  is  to  be  1 
in  the  jth  entry  and  0  elsewhere.  Transpose  these  additional  row  vectors  so  that 
they  become  column  vectors,  and  these  are  vectors  that  may  be  adjoined  to  obtain 
a  basis. 

(6)  Shrinking  of  a  set  {tq, . . . ,  ty}  of  column  vectors  to  a  subset  that  is  a 
basis  for  the  linear  span  of  {vj , . . . ,  vr}.  For  each  i  with  0  <  i  <  r,  compute 
di  =  di m ( span { v\ . . . . ,  ty } ) .  Retain  ty  for  i  >  0  if  d,_i  <  c/,-.  and  discard  V; 
otherwise. 


3.  Linear  Maps 

In  this  section  we  discuss  linear  maps,  first  in  the  setting  of  functions  from  IF"  to 
IF*  and  then  in  the  setting  of  functions  between  two  vector  spaces  over  F.  Much  of 
the  discussion  will  center  on  making  computations  for  such  functions  by  means 
of  matrices. 

We  have  seen  that  any  k-by-n  matrix  A  defines  a  function  L  :  F"  to  F*  by 
L(v)  =  Av  and  that  this  function  satisfies 

L(u  +  v)  =  L(u )  +  L(t>), 

L(cv)  =  cL(v), 

for  all  u  and  v  in  F"  and  all  scalars  c.  A  function  L  :  F"  — >■  F*  satisfying  these 
two  conditions  is  said  to  be  linear,  or  ¥  linear  if  the  scalars  need  emphasizing. 
Traditional  names  for  such  functions  are  linear  maps,  linear  mappings,  and 
linear  transformations.3  Thus  matrices  yield  linear  maps.  Here  is  a  converse. 

Proposition  2.11.  If  L  :  F"  — >■  F*  is  a  linear  map,  then  there  exists  a  unique 
k-by-n  matrix  A  such  that  L(v)  =  Av  for  all  v  in  IF". 

Remark.  The  proof  will  show  how  to  obtain  the  matrix  A. 

PROOF.  For  1  <  j  <  let  c,  be  the  / lh  standard  basis  vector  of  F",  having  1  in 
its  jth  entry  and  0’s  elsewhere,  and  let  the  jth  column  of  A  be  the  ^-dimensional 
column  vector  L(ej).  If  v  is  the  column  vector  (ci,  C2, . . . ,  c„ ) ,  then 

L(w)  =  L(  £''=1  cjej)  =  E"=i  L(Cjej) 

=  E"=  i  cJL(ej)  =  E./=i  cf(/h  column  of  A). 

3 The  term  linear  function  is  particularly  appropriate  when  the  emphasis  is  on  the  fact  that  a 
certain  function  is  linear.  The  term  linear  operator  is  used  also,  particularly  when  the  context  has 
something  to  do  with  analysis. 
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If  L{v)i  denotes  the  ;th  entry  of  the  column  vector  L(v),  this  equality  says  that 

Uv)i  =  £"=,  CjAij. 

The  right  side  is  the  /*  entry  of  Av,  and  hence  L(v)  =  Av.  This  proves  existence. 
For  uniqueness  we  observe  from  the  formula  L(e;)  =  Aej  that  the  /lh  column  of 
A  has  to  be  L(ej)  for  each  j,  and  therefore  A  is  unique.  □ 

In  the  special  case  of  linear  maps  from  F"  to  Fk,  the  proof  shows  that  two  linear 
maps  that  agree  on  the  members  of  the  standard  basis  are  equal  on  all  vectors. 
We  shall  give  a  generalization  of  this  fact  as  Proposition  2.13  below. 

Example  1 .  Let  L  :  R2  — >■  R2  be  rotation  about  the  origin  counterclockwise 
through  the  angle  0.  Taking  L  to  be  defined  geometrically,  one  finds  from  the 
parallelogram  rule  for  addition  of  vectors  that  L  is  linear.  Computation  shows 

that  L  (J)  =  (X")  and  that  L  (?)  =  (  Ts*)'  APP!ying  Proposition  2.11 
and  the  prescription  for  forming  the  matrix  A  given  in  the  proof  of  the  proposition, 
we  see  that  L(v)  =  ( cosf  ^  v  for  all  v  in  R2. 

We  can  add  two  linear  maps  L  :  F"  —>■  Fk  and  M  :  F"  ^  F*  by  adding  their 
values  at  corresponding  points:  (L  +  M)(v )  =  L(v)  +  M(v).  In  addition,  we 
can  multiply  a  linear  map  by  a  scalar  by  multiplying  its  values.  Then  L  +  M 
and  cL  are  linear,  and  it  follows  that  the  set  of  linear  maps  from  F”  to  F*  is  a 
vector  subspace  of  the  vector  space  of  all  functions  from  F"  to  F*,  hence  is  itself 
a  vector  space.  The  customary  notation  for  this  vector  space  is  Homjj^F",  F*); 
the  symbol  Horn  refers  to  the  validity  of  the  rule  L(u  +  v)  =  L (it )  +  L ( v ) ,  and 
the  subscript  F  refers  to  the  validity  of  the  additional  rule  L(cv)  =  cL(v )  for  all 
c  in  F. 

If  L  corresponds  to  the  matrix  A  and  M  corresponds  to  the  matrix  B,  then 
L  +  M  corresponds  to  A  +  B  and  cL  corresponds  to  cA.  The  next  proposition 
shows  that  composition  of  linear  maps  corresponds  to  multiplication  of  matrices. 

Proposition  2.12.  Let  L  :  F"  — >■  Fm  be  the  linear  map  corresponding  to  an 
/w-by-77  matrix  A,  and  let  M  :  Fm  — >■  F*  be  the  linear  map  corresponding  to  a 
k-by-m  matrix  B.  Then  the  composite  function  M  o  L  :  F"  — >■  F^  is  linear,  and 
it  corresponds  to  the  k-by-n  matrix  BA. 

PROOF.  The  function  Mol  satisfies  (M  o  L)(u  +  v)  =  M(L(u  +  u))  = 
M{Lu  +  Lv)  =  M(Lu)  +  M(Lv )  =  (M  o  L)(u)  +  (M  o  L)(u),  and  similarly  it 
satisfies  ( M  o  L)(cu)  =  c(M  o  L)(v).  Therefore  it  is  linear.  The  correspondence 
of  linear  maps  to  matrices  and  the  associativity  of  matrix  multiplication  together 
give  ( M  o  L){v)  =  M(L(v))  =  ( B)(Lv )  =  B(Av)  =  ( BA)v ,  and  therefore 
M  o  L  corresponds  to  BA.  □ 
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Now  let  us  enlarge  the  setting  for  our  discussion,  treating  arbitrary  linear  maps 
L  :  JJ  — V  between  vector  spaces  over  F.  We  say  that  L  :  U  — >■  V  is  linear,  or 
F  linear,  if 

L{u  +  v)  =  L(u)  +  L(v ), 

L(cv)  =  cL(v), 

for  all  u  and  v  in  U  and  all  scalars  c.  As  with  the  special  case  that  U  =  F"  and 
V  =  ¥k,  linear  functions  are  called  linear  maps,  linear  mappings,  and  linear 
transformations.  The  set  of  all  linear  maps  L  :  U  — V  is  a  vector  space  over  F 
and  is  denoted  by  I  Iomv((/.  V).  The  following  result  is  fundamental  in  working 
with  linear  maps. 

Proposition  2.13.  Let  U  and  V  be  vector  spaces  over  F,  and  let  T  be  a  basis 
of  U.  Then  to  each  function  l  :  T  — »■  V  corresponds  one  and  only  one  linear 
map  L  :  U  — »■  V  whose  restriction  to  T  has  L|  =  l . 

Remark.  We  refer  to  L  as  the  linear  extension  of  i. 

PROOF.  Suppose  that  €  :  T  — >  W  is  given.  Since  T  is  a  basis  of  U,  each 
element  of  U  has  a  unique  expansion  as  a  finite  linear  combination  of  members 
of  F.  Say  that  u  =  caiuai  +  •  •  •  +  cUr uai .  Then  the  requirement  of  linearity 
on  L  forces  L(u)  =  L(caiuai  +  •  •  •  +  caruar)  =  caiL(uai)  +  •  •  •  +  carL(uar), 
and  therefore  L  is  uniquely  determined.  For  existence,  define  L  by  this  formula. 
Expanding  u  and  v  in  this  way,  we  readily  see  that  L(u  +  v)  =  L(u )  +  L(v)  and 
L(cu)  =  cL(u).  Therefore  l  has  a  linear  extension.  □ 

The  definition  of  linearity  and  the  proposition  just  proved  make  sense  even  if 
U  and  V  are  infinite-dimensional,  but  our  objective  for  now  will  be  to  understand 
linear  maps  in  terms  of  matrices.  Thus,  until  further  notice  at  a  point  later  in  this 
section,  we  shall  assume  that  U  and  V  are  finite-dimensional.  Remarks  about  the 
infinite-dimensional  case  appear  in  Section  9. 

Since  U  and  V  are  arbitrary  finite-dimensional  vector  spaces,  we  no  longer 
have  standard  bases  at  hand,  and  thus  we  have  no  immediate  way  to  associate  a 
matrix  to  a  linear  map  L  :  U  — >■  V .  What  we  therefore  do  is  fix  arbitrary  bases 
of  U  and  V  and  work  with  them.  It  will  be  important  to  have  an  enumeration  of 
each  of  these  bases,  and  we  therefore  let 

r  =  («i, . . . ,  un) 
and  A  =  (vi, . . . ,  t;*) 

be  ordered  bases  of  U  and  V,  respectively.4  If  a  member  u  of  U  may  be  expanded 

4The  notation  (u i,  ,  un)  for  an  ordered  basis,  with  each  uj  equal  to  a  vector,  is  not  to  be 
confused  with  the  condensed  notation  (ci,  . . . ,  c„)  for  a  single  column  vector,  with  each  Cj  equal  to 
a  scalar. 
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in  terms  of  T  as  u  =  c\ii\  +  •  •  •  +  c„u„,  we  write 


Cl 


calling  this  the  column  vector  expressing  u  in  the  ordered  basis  f.  Using  our 


linear  map  L  :  U 
the 

jth  column  of 


V,  let  us  define  a  k-by-n  matrix  (  ^  )  by  requiring  that 


L 

Ar 


be 


L(iij) 

A 


The  positions  in  which  the  ordered  bases  A  and  T  are  listed  in  the  notation  is 
important  here;  the  range  basis  is  to  the  left  of  the  domain  basis.5 


Example  2.  Let  V  be  the  space  of  all  complex- valued  solutions  on  R  of  the 
differential  equation  y" (t)  =  y(t).  Then  V  is  a  vector  subspace  of  functions, 
hence  is  a  vector  space  in  its  own  right.  It  is  known  that  V  is  2-dimensional  with 
solutions  c\e’  +  cye^1  .  If  y(t)  is  a  solution,  then  differentiation  of  the  equation 
shows  that  y'  (t  )  is  another  solution.  In  other  words,  the  derivative  operator  d/dt  is 
a  linear  map  from  V  to  itself.  One  ordered  basis  of  V  is  T  =  (el ,  e~r),  and  another 
is  A  =  (cosh  t,  sinh  t),  where  cosh  t  =  ^  (e‘  +  e~’)  and  sinh  t  =  \(e*  —  e~').  To 

find  we  need  to  express  ( d/dt )(e’)  and  (d/dt)(e~t)  in  terms  of  cosht 

and  sinh  t .  We  have 


and 


A 

(J/dt)(e_r) 

A 


e 
A 

—e~ 

A 


cosh  t  +  sinh  t 
A 

—  cosh  t  +  sinh  t 
A 


-1 

1 


Therefore  (  ^ \ 


\  AT  ) 


1  -1 

1  1 


Theorem  2.14,  If  L  :  U  — >  V  is  a  linear  map  between  finite-dimensional 
vector  spaces  over  F  and  if  T  and  A  are  ordered  bases  of  U  and  V,  respectively, 
then 

(L& ’)  =  (at)  (r) 

for  all  u  in  U. 

5This  order  occurs  in  a  number  of  analogous  situations  in  mathematics  and  has  the  effect  of 
keeping  the  notation  reasonably  consistent  with  the  notation  for  composition  of  functions. 
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PROOF.  The  two  sides  of  the  identity  in  question  are  linear  in  u ,  and  Proposition 
2.13  shows  that  it  is  enough  to  prove  the  identity  for  the  members  u  of  some 
ordered  basis  of  U.  We  choose  F  as  this  ordered  basis.  For  the  basis  vector 

u  equal  to  the  /lh  member  uj  of  T,  use  of  the  definition  shows  that  ^  ^ 

the  column  vector  e,  that  is  1  in  the  /lh  entry  and  is  0  elsewhere.  The  product 
^  ^  '  '  'th  ^  \ 

A 


is 


J  ej  is  the j  column  of  ^  ^  J ,  which  was  defined  to  be 
the  identity  in  question  is  valid  for  uj,  and  the  theorem  follows. 


.  Thus 


□ 


If  we  take  into  account  Proposition  2.13,  saying  that  linear  maps  on  U  arise 
uniquely  from  arbitrary  functions  on  a  basis  of  U ,  then  Theorem  2.14  supplies 
a  one-one  correspondence  of  linear  maps  L  from  U  to  V  with  matrices  A  of 
the  appropriate  size,  once  we  fix  ordered  bases  in  the  domain  and  range.  The 

correspondence  is  L  ^ 

As  in  the  special  case  with  linear  maps  between  spaces  of  column  vectors, 
this  correspondence  respects  addition  and  scalar  multiplication.  Theorem  2.14 
implies  that  under  this  correspondence,  the  image  of  L  corresponds  to  the  column 
space  of  A.  It  implies  also  that  the  vector  subspace  of  the  domain  U  with  Liu)  = 
0,  which  is  called  the  kernel  of  L  and  is  sometimes  denoted  by  ker  L ,  corresponds 
to  the  null  space  of  A.  The  kernel  of  L  has  the  important  property  that 

the  linear  map  L  is  one-one  if  and  only  if  ker  L  =  0. 

Another  important  property  comes  from  this  association  of  kernel  with  null  space 
and  of  image  with  column  space.  Namely,  we  apply  Theorem  2.6,  and  we  obtain 
the  following  corollary. 


Corollary  2.15.  If  L  :  U 

vector  spaces  over  F,  then 


V  is  a  linear  map  between  finite-dimensional 


dim(domain(L))  =  dim(kernel(L))  +  dim(image(L)). 

The  next  result  says  that  composition  corresponds  to  matrix  multiplication 
under  the  correspondence  of  Theorem  2.14. 

Theorem  2.16.  Let  L  :  U  — >■  V  and  M  :  V  — >■  W  be  linear  maps  between 
finite-dimensional  vector  spaces,  and  let  T,  A,  and  !T2  be  ordered  bases  of  U , 
V,  and  W.  Then  the  composition  ML  is  linear,  and  the  corresponding  matrix  is 
given  by 

/  ML  \  _  /  M  \  /  L 

l  nr  )  ~  l  £2A  )  l  Ar 
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PROOF.  If  u  is  in  U ,  three  applications  of  Theorem  2.14  and  one  application 
of  associativity  of  matrix  multiplication  give 


ML  \  f  u 

urllr 


ML(u)\ 
Q  ) 


M 

QA 


L 

AT 


M  \  /  L(u) 
QAJ\  A 

u 

r 


M 

QA 


L 

AT 


Taking  u  to  be  the  /th  member  of  T,  we  see  from  this  equation  that  the  /"  column 


°f  ^  j  equals  the  /' 
theorem  follows. 


column  of 


M 

QA 


L 

AT 


Since  j  is  arbitrary,  the 

□ 


A  computational  device  that  appears  at  first  to  be  only  of  theoretical  interest 
and  then,  when  combined  with  other  things,  becomes  of  practical  interest,  is  to 
change  one  of  the  ordered  bases  in  computing  the  matrix  of  a  linear  map.  A  handy 

7 


device  for  this  purpose  is  a  change-of-basis  matrix 


AT 


since  Theorem  2.16 


gives 


L 

AT 


7 

AT 


L 

rr 


Example  2,  continued.  Let  L  be  d/dt  as  a  linear  map  carrying  the  space  of 
solutions  of  y"(t)  =  y(t )  to  itself,  with  T  =  ( e‘ ,  e~‘ )  and  A  =  (coshf,  sinhf) 

as  before.  Then  ^  ^  ^ 


cosht  —  sinht, 


I 

AT 

l  \  (d/dt\  -1 

at  )  ^  rr  )  ~  \  i  l 

by  computing  matters  directly. 


°  -1 

l'  1 
1  -1 


.  Since  er  =  cosh  t  +  sinh  t  and  e  1  = 


by  inspection.  The  product  is 
,  a  result  we  found  before  with  a  little  more  effort 


L 

AT 


Often  in  practical  applications  the  domain  and  the  range  are  the  same  vector 
space,  the  domain’s  ordered  basis  equals  the  range’s  ordered  basis,  and  the  matrix 
of  a  linear  map  is  known  in  this  ordered  basis.  The  problem  is  to  determine  the 
matrix  when  the  ordered  basis  is  changed  in  both  domain  and  range— changed  in 
such  a  way  that  the  ordered  bases  in  the  domain  and  range  are  the  same.  This  time 

we  use  two  change-of-basis  matrices  (  )  and  (  J ^  ) ,  but  these  are  related 


Since 


T  A 


7 

AT 


7 

rr 


ar  ) and  yra 

=  7,  the  two  matrices  are  the  inverses  of  one 
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another.  Thus,  except  for  matrix  algebra,  the  problem  is  to  compute  just  one  of 

ta)  and  (at 

Normally  one  of  these  two  matrices  can  be  written  down  by  inspection.  For 
example,  if  we  are  working  with  a  linear  map  from  a  space  of  column  vectors 
to  itself,  one  ordered  basis  of  interest  is  the  standard  ordered  basis  X .  Another 
ordered  basis  A  might  be  determined  by  special  features  of  the  linear  map.  In 
this  case  the  members  of  A  are  given  as  column  vectors,  hence  are  expressed  in 

can  be  written  by  inspection.  We  shall  encounter  this 

situation  later  in  this  chapter  when  we  use  “eigenvectors”  in  order  to  understand 
linear  maps  better.  Here  is  an  example,  but  without  eigenvectors. 

Example  1,  continued.  We  saw  that  rotation  L  counterclockwise  about 
the  origin  in  R2  is  given  in  the  standard  ordered  basis  X  =  ((o),(i))by 

j .  Let  us  compute  the  matrix  of  L  in  the  ordered  basis  A  = 
y  change-of-basis  matrix  to  form  is 

Hence 

(  L  \_(  1  \(  L  \(  1  \_( 1  I  \  7cos6»  —  sind\ /I  lh 

\AA  /  \AX  j\XX  jyXA  /  \0  l)  y  sind  cosdjyO  1 )’ 

and  the  problem  is  reduced  to  one  of  matrix  algebra. 

Our  computations  have  proved  the  following  proposition,  which,  as  we  shall 
see  later,  motivates  much  of  Chapter  V.  The  matrix  C  in  the  statement  of  the 

proposition  is 

Proposition  2.17,  Let  L  :  V  — >■  V  be  a  linear  map  on  a  finite-dimensional 
vector  space,  and  let  A  be  the  matrix  of  L  relative  to  an  ordered  basis  T  (in  domain 
and  range).  Then  the  matrix  of  L  in  any  other  ordered  basis  A  is  of  the  form 
C~l  AC  for  some  invertible  matrix  C  depending  on  A. 

Remark.  If  A  is  a  square  matrix,  any  square  matrix  of  the  form  C-1  AC  is  said 
to  be  similar  to  A.  It  is  immediate  that  “is  similar  to”  is  an  equivalence  relation. 

Now  let  us  return  to  the  setting  in  which  our  vector  spaces  are  allowed  to  be 
infinite-dimensional.  Two  vector  spaces  U  and  V  are  said  to  be  isomorphic  if 
there  is  a  one-one  linear  map  of  U  onto  V.  In  this  case,  the  linear  map  in  question 
is  called  an  isomorphism,  and  one  often  writes  U  =  V. 
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Here  is  a  finite-dimensional  example:  If  U  is  ^-dimensional  with  an  ordered 
basis  F  and  V  is  A- dimensional  with  an  ordered  basis  A,  then  HomjTAA,  V)  is 
isomorphic  to  Mm t(F)  by  the  linear  map  that  carries  a  member  L  of  Homf(17,  V) 

to  the  A'-bv-n  matrix 

The  relation  “is  isomorphic  to”  is  an  equivalence  relation.  In  fact,  it  is  reflexive 
since  the  identity  map  exhibits  JJ  as  isomorphic  to  itself.  It  is  transitive  since 
Theorem  2.16  shows  that  the  composition  ML  of  two  linear  maps  L  :  U  — >  V 
and  M  :  V  — »■  W  is  linear  and  since  the  composition  of  one-one  onto  functions 
is  one-one  onto.  To  see  that  it  is  symmetric,  we  need  to  observe  that  the  inverse 
function  L-1  of  a  one-one  onto  linear  map  L  :  U  — »■  V  is  linear.  To  see  this 
linearity,  we  observe  that  L(L“ '(«!)  + L_1(u2))  =  L{L~l{vi))  +  L(L~l(v2))  = 
Vi  +  v2  =  /(vi  +  v2)  =  L(L~l(v i  +  to))-  Since  L  is  one-one, 

L-1(vl)  +  L-l(v2)  =  L-1(vl  +  v2). 

Similarly  the  facts  that  L(L~l(cv))  =  cv  =  cLJL~xv)  =  L(c(L~l  (v)))  and  that 
L  is  one-one  imply  that 

L~\cv)  =  c(L~l(v)), 

and  hence  L-1  is  linear.  Thus  “is  isomorphic  to”  is  indeed  an  equivalence  relation. 

The  vector  spaces  over  ¥  are  partitioned,  according  to  the  basic  result  about 
equivalence  relations  in  Section  A2  of  the  appendix,  into  equivalence  classes. 
Each  member  of  an  equivalence  class  is  isomorphic  to  all  other  members  of  that 
class  and  to  no  member  of  any  other  class. 

An  isomorphism  preserves  all  the  vector-space  structure  of  a  vector  space. 
Spanning  sets  are  mapped  to  spanning  sets,  linearly  independent  sets  are  mapped 
to  linearly  independent  sets,  vector  subspaces  are  mapped  to  vector  subspaces, 
dimensions  of  subspaces  are  preserved,  and  so  on.  In  other  words,  for  all  purposes 
of  abstract  vector- space  theory,  isomorphic  vector  spaces  may  be  regarded  as  the 
same.  Let  us  give  a  condition  for  isomorphism  that  might  at  first  seem  to  trivialize 
all  vector-space  theory,  reducing  it  to  a  count  of  dimensions,  but  then  let  us  return 
to  say  why  this  result  is  not  to  be  considered  as  so  important. 

Proposition  2.18.  Two  finite-dimensional  vector  spaces  over  F  are  isomorphic 
if  and  only  if  they  have  the  same  dimension. 

PROOF.  If  a  vector  space  U  is  isomorphic  to  a  vector  space  V,  then  the 
isomorphism  carries  any  basis  of  U  to  a  basis  of  V,  and  hence  U  and  V  have  the 
same  dimension.  Conversely  if  they  have  the  same  dimension,  let  (ui, ,  un ) 
be  an  ordered  basis  of  U.  and  let  (iq, . . . ,  vn)  be  an  ordered  basis  of  V.  Define 
£(uj)  =  Vj  for  1  <  j  <  n,  and  let  L  :  U  — »■  V  be  the  linear  extension  of  l 
given  by  Proposition  2.13.  Then  L  is  linear,  one-one,  and  onto,  and  hence  JJ  is 
isomorphic  to  V.  □ 
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The  proposition  does  not  mean  that  one  should  necessarily  be  eager  to  make  the 
identification  of  two  vector  spaces  that  are  isomorphic .  An  important  distinction  is 
the  one  between  “isomorphic”  and  “isomorphic  via  a  canonically  constructed  lin¬ 
ear  map.”  The  isomorphism  of  linear  maps  with  matrices  given  by  L 

is  canonical  since  no  choices  are  involved  once  T  and  A  have  been  specified. 
This  is  a  useful  isomorphism  because  we  can  track  matters  down  and  use  the 
isomorphism  to  make  computations.  On  the  other  hand,  it  is  not  very  useful  to 
say  merely  that  Hom>’(f/,  V)  and  M^n  (F)  are  isomorphic  because  they  have  the 
same  dimension. 

What  tends  to  happen  in  practice  is  that  vector  spaces  in  applications  come 
equipped  with  additional  structure— some  rigid  geometry,  or  a  multiplication 
operation,  or  something  else.  A  general  vector-space  isomorphism  has  little 
chance  of  having  any  connection  to  the  additional  structure  and  thereby  of  being 
very  helpful.  On  the  other  hand,  a  concrete  isomorphism  that  is  built  by  taking 
this  additional  structure  into  account  may  indeed  be  useful. 

In  the  next  section  we  shall  encounter  an  example  of  an  additional  structure 
that  involves  neither  a  rigid  geometry  nor  a  multiplication  operation.  We  shall 
introduce  the  “dual”  V'  of  a  vector  space  V,  and  we  shall  see  that  V  and  V'  have 
the  same  dimension  if  V  is  finite-dimensional.  But  no  particular  isomorphism 
of  V  with  V'  is  singled  out  as  better  than  other  ones,  and  it  is  wise  not  to  try 
to  identify  these  spaces.  By  contrast,  the  double  dual  V”  of  V,  which  too  will 
be  constructed  in  the  next  section,  will  be  seen  to  be  isomorphic  to  V  in  the 
finite-dimensional  case  via  a  linear  map  (  :  V  — »■  V"  that  we  define  explicitly. 
The  function  l  is  an  example  of  a  canonical  isomorphism  that  we  might  want  to 
exploit. 


4.  Dual  Spaces 

Let  V  be  a  vector  space  over  F.  A  linear  functional  on  V  is  a  linear  map  from 
V  into  F.  The  space  of  all  such  linear  maps,  as  we  saw  in  Section  3,  is  a  vector 
space.  We  denote  it  by  V'  and  call  it  the  dual  space  of  V. 

The  development  of  Section  3  tells  us  right  away  how  to  compute  the  dual 
space  of  the  space  of  column  vectors  IF" .  If  £  is  the  standard  ordered  basis  of  F" 
and  if  1  denotes  the  basis  of  F  consisting  of  the  scalar  1 ,  then  we  can  associate  to 
a  linear  functional  v'  on  F"  its  matrix 

^  =  (i/(ei)  v'(e2)  •••  v'(en)), 

which  is  an  n -dimensional  row  vector.  The  operation  of  v'  on  a  column  vector 
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v  =  ^  ;  J  is  given  by  Theorem  2.14.  Namely,  v'(v)  is  a  multiple  of  the  scalar  1, 
and  the  theorem  tells  us  how  to  compute  this  multiple: 

=  (v'(e1)  v'(e2)  •••  v'(en))  (  :  J  . 

\xnJ 

Thus  the  space  of  all  linear  functionals  on  IF"  may  be  identified  with  the  space  of 
all  n -dimensional  row  vectors,  and  the  effect  of  the  row  vector  on  a  column  vector 
is  given  by  matrix  multiplication.  Since  the  standard  ordered  basis  of  IF"  and  the 
basis  1  of  IF  are  singled  out  as  special,  this  identification  is  actually  canonical, 
and  it  is  thus  customary  to  make  this  identification  without  further  comment. 

For  a  more  general  vector  space  V,  no  natural  way  of  writing  down  elements 
of  V'  comes  to  mind.  Indeed,  if  a  concrete  V  is  given,  it  can  help  considerably 
in  understanding  V  to  have  an  identification  of  V'  that  does  not  involve  choices. 
For  example,  in  real  analysis  one  proves  in  a  suitable  infinite-dimensional  setting 
that  a  (continuous)  linear  functional  on  the  space  of  integrable  functions  is  given 
by  integration  with  a  bounded  function,  and  that  fact  simplifies  the  handling  of 
the  space  of  integrable  functions. 

In  any  event,  the  canonical  identification  of  linear  functionals  that  we  found 
for  IF"  does  not  work  once  we  pass  to  a  more  general  finite-dimensional  vector 
space  V.  To  make  such  an  identification  in  the  absence  of  additional  structure, 
we  first  fix  an  ordered  basis  (iq, . . . ,  v„)  of  V.  If  we  do  so,  then  V  is  indeed 
identified  with  the  space  of  n -dimensional  row  vectors.  The  members  of  V'  that 
correspond  to  the  standard  basis  of  row  vectors,  i.e.,  the  row  vectors  that  are  1 
in  one  entry  and  are  0  elsewhere,  are  of  special  interest.  These  are  the  linear 
functionals  v-  such  that 

v'iivj)  =  Sij, 

where  is  the  Kronecker  delta.  Since  these  standard  row  vectors  form  a  basis  of 
the  space  of  row  vectors,  (v[, . . . ,  v’n)  is  an  ordered  basis  of  V .  If  the  members 
of  the  ordered  basis  (ui, . . . ,  vn)  are  permuted  in  some  way,  the  members  of 
(Wj, . . . ,  v'n)  are  permuted  in  the  same  way.  Thus  the  basis  {v[ , . . . ,  v'n }  depends 
only  on  the  basis  {ui, . . . ,  vn } ,  not  on  the  enumeration.6  The  basis  [v[, . . . ,  v'n } 
is  called  the  dual  basis  of  V  relative  to  {iq, . . . ,  vn } .  A  consequence  of  this 
discussion  is  the  following  result. 

Proposition  2.19.  If  V  is  a  finite-dimensional  vector  space  with  dual  V',  then 
V'  is  finite-dimensional  with  dim  V'  =  dim  V. 

6Although  the  enumeration  is  not  important,  more  structure  is  present  here  than  simply  an 
association  of  an  unordered  basis  of  V'  to  an  unordered  basis  of  V.  Each  member  of  [v[ , . . . ,  v'n }  is 
matched  to  a  particular  member  of  {ui ,  . . . ,  vn } ,  namely  the  one  on  which  it  takes  the  value  1. 
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Linear  functionals  play  an  important  role  in  working  with  a  vector  space.  To 
understand  this  role,  it  is  helpful  to  think  somewhat  geometrically.  Imagine  the 
problem  of  describing  a  vector  subspace  of  a  given  vector  space.  One  way  of 
describing  it  is  from  the  inside,  so  to  speak,  by  giving  a  spanning  set.  In  this 
case  we  end  up  by  describing  the  subspace  in  terms  of  parameters,  the  parameters 
being  the  scalar  coefficients  when  we  say  that  the  subspace  is  the  set  of  all  finite 
linear  combinations  of  members  of  the  spanning  set.  Another  way  of  describing 
the  subspace  is  from  the  outside,  cutting  it  down  by  conditions  imposed  on  its 
elements.  These  conditions  tend  to  be  linear  equations,  saying  that  certain  linear 
maps  on  the  elements  of  the  subspace  give  0.  Typically  the  subspace  is  then 
described  as  the  intersection  of  the  kernels  of  some  set  of  linear  maps.  Frequently 
these  linear  maps  will  be  scalar- valued,  and  then  we  are  in  a  situation  of  describing 
the  subspace  by  a  set  of  linear  functionals. 

We  know  that  every  vector  subspace  of  a  finite-dimensional  vector  space  V 
can  be  described  from  the  inside  in  this  way;  we  merely  give  all  its  members.  A 
statement  with  more  content  is  that  we  can  describe  it  with  finitely  many  members ; 
we  can  do  so  because  we  know  that  every  vector  subspace  of  V  has  a  basis. 

For  linear  functionals  really  to  be  useful,  we  would  like  to  know  a  correspond¬ 
ing  fact  about  describing  subspaces  from  the  outside— that  every  vector  subspace 
U  of  a  finite-dimensional  V  can  be  described  as  the  intersection  of  the  kernels  of 
a  finite  set  of  linear  functionals.  To  do  so  is  easy.  We  take  a  basis  of  the  vector 
subspace  U,  say  {iq, . . . ,  vr } ,  extend  it  to  a  basis  of  V  by  adjoining  vectors 
iy+i,  . . . ,  i>„,  and  form  the  dual  basis  {v[,  . . . ,  v'n]  of  V .  The  subspace  U  is  then 
described  as  the  set  of  all  vectors  v  in  V  such  that  vUv)  =  0  for  r  +  1  <  j  <  n. 
The  following  proposition  expresses  this  fact  in  ways  that  are  independent  of  the 
choice  of  a  basis.  It  uses  the  terminology  annihilator  of  U,  denoted  by  Ann(t/), 
for  the  vector  subspace  of  all  members  v '  of  V  with  v'(u)  =  0  for  all  u  in  U. 

Proposition  2.20.  Let  V  be  a  finite-dimensional  vector  space,  and  let  U  be  a 
vector  subspace  of  V.  Then 

(a)  dim  U  +  dim  Ann((7)  =  dim  V, 

(b)  every  linear  functional  on  U  extends  to  a  linear  functional  on  V, 

(c)  whenever  vq  is  a  member  of  V  that  is  not  in  U,  there  exists  a  linear 
functional  on  V  that  is  0  on  U  and  is  1  on  i<o. 

PROOF.  We  retain  the  notation  above,  writing  {iq,  . . . ,  iy}  for  a  basis  of  U , 
vr+\, . . . ,  vn  for  vectors  that  are  adjoined  to  form  a  basis  of  V ,  and  (tq , . . . ,  v'n } 
forthe  dual  basis  of  V' .  For(a),  we  check  that}  u'+1,  . . . ,  v'n }  isabasisof  Ann(f/). 
It  is  enough  to  see  that  they  span  Ann(L).  These  linear  functionals  are  0  on  every 
member  of  the  basis  {iq, . . . ,  ty}  of  U  and  hence  are  in  Ann(t/).  On  the  other 
hand,  if  v'  is  a  member  of  AnnfL ) ,  we  can  certainly  write  v'  =  a  v[  -| - b  cn  v'n 
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for  some  scalars  c\, . . cn.  Since  v'  is  0  on  U ,  we  must  have  v'(vj)  =  0  for 
i  <  r.  Since  v'(ijj)  =  c, ,  we  obtain  c,-  =  0  for  i  <  r.  Therefore  v'  is  a  linear 
combination  of  v'r+i . . . . ,  v'n,  and  (a)  is  proved. 

For  (b),  let  us  observe  that  the  restrictions  v[  |  (J ,  . . . ,  v'r\u  form  the  dual  basis 
of  U'  relative  to  the  basis  j  i;  i , . . . ,  v,  }  of  IJ .  If  u  is  in  U\  we  can  therefore  write 
u '  =  C\  I’,  |  +•  •  ■+crv'r\u  for  some  scalars  ci,  . . . ,  cr.  Thent/  =  +  -  •  ■+crv'r 

is  the  required  extension  of  u!  to  all  of  V. 

For  (c),  we  use  a  special  choice  of  basis  of  V  in  the  argument  above.  Namely, 
we  still  take  {iq, . . . ,  vr }  to  be  a  basis  of  U ,  and  then  we  let  vr+\  =  vq.  Finally 
we  adjoin  vr+2, . . . ,  vn  to  obtain  a  basis  {iq, . . . ,  vn }  of  V .  Then  vrr+l  has  the 
required  property.  □ 

If  L  :  U  — >■  V  is  a  linear  map  between  hnite-dimensional  vector  spaces,  then 
the  formula 


(Z/(i/))(w)  =  v' (L{u))  for  u  e  U  and  v'  e  V' 

defines  a  linear  map  Z/  :  V  — »■  U' .  The  linear  map  L'  is  called  the  contragre- 
dient  of  L.  The  matrix  of  the  contragredient  of  L  is  the  transpose  of  the  matrix 
of  L  in  the  following  sense.7 

Proposition 2.21.  Let  L  :  U  — >■  V  be  a  linear  map  between  finite-dimensional 
vector  spaces,  let  V  :  V'  — >■  U'  be  its  contragredient,  let  T  and  A  be  respective 
ordered  bases  of  U  and  V,  and  let  T'  and  A'  be  their  dual  ordered  bases.  Then 


Proof.  Let  T  =  («i, . . . ,  un ),  A  =  (wi, . . . ,  v *),  F'  =  (u j, . . . ,  u'n),  and 
A'  =  (nj, . . . ,  v'k ) .  Write  B  and  A  for  the  respective  matrices  in  the  formula 

in  question.  The  equations  L(uj)  =  Ylr=\  ^ >']vi '  anc^  ^Wi)  =  YTy=  l  ^j'i ll'j' 
imply  that 

Vi(L(Uj ))  =  «;( E*=i  Ai'jVi ')  =  Aij 
and  L'(Vi)(Uj)  =  E"=i  />’;  /»/  ("/»  =  Bj>- 

Therefore  Bp  =  L'(n(0(u;)  =  n('(L(M;))  =  Atj,  as  required.  □ 

7A  general  principle  is  involved  in  the  definition  of  contragredient  once  we  have  a  definition  of 
dual  vector  space,  and  we  shall  see  further  examples  of  this  principle  in  the  next  two  sections  and  in 
later  chapters:  whenever  a  new  systematic  construction  appears  for  the  objects  under  study,  it  is  well 
to  look  for  a  corresponding  construction  with  the  functions  relating  these  new  objects.  In  language 
to  be  introduced  near  the  end  of  Chapter  IV,  the  context  for  the  construction  will  be  a  "category  ”  and 
the  principle  says  that  it  is  well  to  see  whether  the  construction  is  that  of  a  "functor”  on  the  category. 
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With  V  finite-dimensional,  now  consider  V"  =  {V')' ,  the  double  dual.  In  the 
case  that  V  =  IF",  we  saw  that  V'  could  be  viewed  as  the  space  of  row  vectors, 
and  it  is  reasonable  to  expect  V"  to  involve  a  second  transpose  and  again  be  the 
space  of  column  vectors.  If  so,  then  V  gets  identified  with  V" .  In  fact,  this  is  true 
in  all  cases,  and  we  argue  as  follows.  If  v  is  in  V,  we  can  define  a  member  t(v) 
of  V "  by 

t(u)(t/)  =  v'(v)  for  v  e  V  and  v'  e  V' . 

This  definition  makes  sense  whether  or  not  V  is  finite-dimensional.  The  function 
i  is  a  linear  map  from  V  into  V"  called  the  canonical  map  of  V  into  V" .  It  is 
independent  of  any  choice  of  basis. 

Proposition  2.22.  If  V  is  any  finite-dimensional  vector  space  over  F,  then  the 
canonical  map  l  :  V  ^  V"  is  one-one  onto. 

Remarks.  In  the  infinite-dimensional  case  the  canonical  map  is  one-one  but 
it  is  not  onto.  The  proof  that  it  is  one-one  uses  the  fact  that  V  has  a  basis,  but 
we  have  deferred  the  proof  of  this  fact  about  infinite-dimensional  vector  spaces 
to  Section  9.  Problem  14  at  the  end  of  the  chapter  will  give  an  example  of  an 
infinite-dimensional  V  for  which  i  does  not  carry  V  onto  V".  When  combined 
with  the  first  corollary  in  Section  A6  of  the  appendix,  this  example  shows  that  ( 
never  carries  V  onto  V "  in  the  infinite-dimensional  case. 

PROOF.  We  saw  in  Section  3  that  a  linear  map  t  is  one-one  if  and  only  if 
kert  =  0.  Thus  suppose  i(v)  =  0.  ThenO  =  l(v)( v')  =  v'(v)  for  all  v'.  Arguing 
by  contradiction,  suppose  v/0.  Then  we  can  extend  {i>}  to  a  basis  of  V,  and  the 
linear  functional  v'  that  is  I  on  v  and  is  0  on  the  other  members  of  the  basis  will 
have  v'(v)  i=-  0,  contradiction.  We  conclude  that  /  is  one-one.  By  Proposition 
2.19  we  have 

dim  V  =  dim  V'  =  dim  V".  (*) 

Since  t  is  one-one,  it  carries  any  basis  of  V  to  a  linearly  independent  set  in  V" . 
This  linearly  independent  set  has  to  be  a  basis,  by  Corollary  2.4  and  the  dimension 
formula  (*).  □ 


5.  Quotients  of  Vector  Spaces 

This  section  constructs  a  vector  space  V/U  out  of  a  vector  space  V  and  a  vector 
subspace  U.  We  begin  with  the  example  illustrated  in  Figure  2.1.  In  the  vector 
space  V  =  R2,  let  U  be  a  line  through  the  origin.  The  lines  parallel  to  U  are 
of  the  form  v  +  U  =  {v  +  u  \  u  e  U},  and  we  make  the  set  of  these  lines 
into  a  vector  space  by  defining  (m  +  U)  +  (m  +  U)  =  (ui  +  m)  +  U  and 
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c(v  +  JJ)  =  cv  +  U .  The  figure  suggests  that  if  we  were  to  take  any  other  line 
W  through  the  origin,  then  W  would  meet  all  the  lines  v  +  U,  and  the  notion  of 
addition  of  lines  v  +  U  would  correspond  exactly  to  addition  in  W.  Indeed  we 
can  successfully  make  such  a  correspondence,  but  the  advantage  of  introducing 
the  vector  space  of  all  lines  v  +  U  is  that  it  is  canonical,  independent  of  the  kind 
of  choice  we  have  to  make  in  selecting  W.  One  example  of  the  utility  of  having  a 
canonical  construction  is  the  ease  with  which  we  obtain  correspondence  of  linear 
maps  stated  in  Proposition  2.25  below.  Other  examples  will  appear  later. 


U 


Figure  2. 1 .  The  vector  space  of  lines  v  +  U  in  R2 
parallel  to  a  given  line  U  through  the  origin. 

Proposition  2.23.  Let  V  be  a  vector  space  over  IF,  and  let  U  be  a  vector 
subspace.  The  relation  defined  by  saying  that  v\  ~  v2  if  tfi  —  v2  is  in  U  is  an 
equivalence  relation,  and  the  equivalence  classes  are  all  sets  of  the  form  v  +  U 
with  v  e  V.  The  set  of  equivalence  classes  V/U  is  a  vector  space  under  the 
definitions 


(1>1  +  U)  +  (i>2  +  U)  =  Oi  +  v2)  +  u, 
c(v  +  U)  =  cv  +  U, 

and  the  function  q(v)  =  v  +  U  is  linear  from  V  onto  V/U  with  kernel  U. 

Remarks.  We  say  that  V/U  is  the  quotient  space  of  V  by  U.  The  linear  map 
q(v)  =  v  +  U  is  called  the  quotient  map  of  V  onto  V/U. 

PROOF.  The  properties  of  an  equivalence  relation  are  established  as  follows: 

v\  ~  iq  because  0  is  in  U , 

V\  ~  i<2  implies  v2  ~  i>i  because  U  is  closed  under  negatives, 

V\  ~  v2  and  v2  ~  V3 

together  imply  iq  ~  V3  because  U  is  closed  under  addition. 

Thus  we  have  equivalence  classes.  The  class  of  v\  consists  of  all  vectors  v2  such 
that  v2  —  vi  is  in  U,  hence  consists  of  all  vectors  in  v\  +  U.  Thus  the  equivalence 
classes  are  indeed  the  sets  v  +  U. 
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Let  us  check  that  addition  and  scalar  multiplication,  as  given  in  the  statement 
of  the  proposition,  are  well  defined.  For  addition  let  iq  ~  W\  and  v2  ~  w2 ■ 
Then  iq  —  w  i  and  v2  —  w2  are  in  U .  Since  U  is  a  vector  subspace,  the  sum 
(tq  -w  1)  +  (ib  —  w2)  =  Oi  +  v2)  -  (w i  +  w2)  is  in  U.  Thus  iq  +  v2  ~  uq  +  w2, 
and  addition  is  well  defined.  For  scalar  multiplication  let  v  ~  w,  and  let  a  scalar 
c  be  given.  Then  v  —  w  is  in  U,  and  c(v  —  w)  =  cv  —  cw  is  in  U  since  U  is  a 
vector  subspace.  Hence  cv  ~  cw,  and  scalar  multiplication  is  well  defined. 

The  vector-space  properties  of  V /  U  are  consequences  of  the  properties  for  V. 
To  illustrate,  consider  associativity  of  addition.  The  argument  in  this  case  is  that 

«Ti  +  U)  +  (v2  +  U))  +  (u3  +  U)  =  ((tq  +  v2)  +  U)+  (u3  +  U) 

=  ((ui  +  v2)  +  v3)  +  U  =  (ui  +  (v2  +  U3))  +  U 
=  (VI  +  U)  +  ((v2  +  u3)  +  t/)  =  (ui  +  f/)  +  ((v2  +  f/)  +  (u3  +  {/)). 

Finally  the  quotient  map  q  :  V  — >■  V/f7  given  hy  <r/  ( i.> )  =  u  +  1/  is  certainly 
linear.  Its  kernel  is  {v  |  v  +  U  =  0  +U],  and  this  equals  {u  |  v  €  U],  as  asserted. 
The  map  q  is  onto  V/U  since  v  +  U  =  q(v).  □ 

Corollary  2.24.  If  V  is  a  vector  space  over  F  and  U  is  a  vector  subspace,  then 

(a)  dim  V  =  dim  U  +  dim( V/U), 

(b)  the  subspace  U  is  the  kernel  of  some  linear  map  defined  on  V. 

Remark.  The  first  conclusion  is  valid  even  when  all  the  spaces  are  not  finite¬ 
dimensional.  For  current  purposes  it  is  sufficient  to  regard  dim  V  as  +oo  if  V  is 
infinite-dimensional;  the  sum  of  +oo  and  any  dimension  as  +oo. 

PROOF.  Let  q  be  the  quotient  map.  The  linear  map  q  meets  the  conditions  of 
(b).  For  (a),  take  a  basis  of  U  and  extend  to  a  basis  of  V.  Then  the  images  under 
q  of  the  additional  vectors  form  a  basis  of  V/U .  □ 

Quotients  of  vector  spaces  allow  for  the  factorization  of  certain  linear  maps, 
as  indicated  in  Proposition  2.25  and  Figure  2.2. 

Proposition  2.25.  Let  L  :  V  —>■  W  be  a  linear  map  between  vector 
spaces  over  F,  let  Uq  =  kerL,  let  U  be  a  vector  subspace  of  V  contained  in 
Uq,  and  let  q  :  V  V/U  be  the  quotient  map.  Then  there  exists  a  linear 
map  L  :  V/U  — ►  W  such  that  L  =  Lq.  It  has  the  same  image  as  L,  and 
kerL  =  {m0  +  U  \  uq  e  t/o}. 

V  — ^  W 

n 

q  /  l 
V/U 

Figure  2.2.  Factorization  of  linear  maps  via  a  quotient  of  vector  spaces. 
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Remark.  One  says  that  L  factors  through  V/U  or  descends  to  V/U . 

PROOF.  The  definition  of  L  has  to  be  L ( v  +  U)  =  L(v).  This  forces  Lq  =  L, 
and  L  will  have  to  be  linear.  What  needs  proof  is  that  L  is  well  defined.  Thus 
suppose  i>i  ~  x>2 ■  We  are  to  prove  that  L( v\  +  U)  =  L(v 2  +  U ),  i.e.,  that 
L(v  1)  =  L( V2)-  Now  v\  —  V2  is  in  U  c  Uo,  and  hence  L(v\  —  V2 )  =  0.  Then 
L(v  1)  =  L{v\  —  V2)  +  L{v 2)  =  L(v 2),  as  required.  This  proves  that  L  is  well 
defined,  and  the  conclusions  about  the  image  and  the  kernel  of  L  are  immediate 
from  the  definition.  □ 

Corollary  2.26.  Let  L  :  V  — >■  W  be  a  linear  map  between  vector  spaces  over 
F,  and  suppose  that  L  is  onto  W  and  has  kernel  U .  Then  V/U  is  canonically 
isomorphic  to  W. 

Proof.  Take  U  =  Uo  in  Proposition  2.25,  and  form  L  :  V/U  — »■  VI7  with 
L  =  Lq.  The  proposition  shows  that  L  is  onto  W  and  has  trivial  kernel,  i.e.,  the  0 
element  of  V/U.  Having  trivial  kernel,  L  is  one-one.  □ 

Theorem  2.27  (First  Isomorphism  Theorem).  Let  L  :  V  — >■  W  be  a  linear 
map  between  vector  spaces  over  F,  and  suppose  that  L  is  onto  W  and  has  kernel 
U.  Then  the  map  S  m>-  Lis)  gives  a  one-one  correspondence  between 

(a)  the  vector  subspaces  S  of  V  containing  U  and 

(b)  the  vector  subspaces  of  W. 

Remark.  As  in  Section  A1  of  the  appendix,  we  write  L(S)  and  L~X(T)  to 
indicate  the  direct  and  inverse  images  of  S  and  T,  respectively. 

PROOF.  The  passage  from  (a)  to  (b)  is  by  direct  image  under  L,  and  the  passage 
from  (b)  to  (a)  will  be  by  inverse  image  under  L~l.  Certainly  the  direct  image 
of  a  vector  subspace  as  in  (a)  is  a  vector  subspace  as  in  (b).  We  are  to  show  that 
the  inverse  image  of  a  vector  subspace  as  in  (b)  is  a  vector  subspace  as  in  (a)  and 
that  these  two  procedures  invert  one  another. 

For  any  vector  subspace  T  of  W,  L~l  (T)  is  a  vector  subspace  of  V.  In  fact,  if 
v\  and  i>2  are  in  we  can  write  L{v\ )  =  t\  and  L(v 2)  =  to  with  t\  and  to 

in  T .  Then  the  equations  L( v\  +  V2)  =  t\  +  L  and  L(ciq)  =  cL(v  1)  =  ct\  show 
that  v\  +  V2  and  cv \  are  in  L~x  ( T ). 

Moreover,  the  vector  subspace  L-1(T)  contains  L-1(0)  =  U .  Therefore  the 
inverse  image  under  L  of  a  vector  subspace  as  in  (b)  is  a  vector  subspace  as  in 
(a).  Since  L  is  a  function,  we  have  L{L~l(T))  =  T .  Thus  passing  from  (b)  to 
(a)  and  back  recovers  the  vector  subspace  of  W. 

If  S  is  a  vector  subspace  of  V  containing  U,  we  still  need  to  see  that  S  = 
L~1(L(S)).  Certainly  S  C  L~l(L(S)).  In  the  reverse  direction  let  v  be  in 
L~1(L(S)).  Then  L(v)  is  in  L(S),  i.e.,  L(v)  =  L(s)  for  some  s  in  S.  Since  L 
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is  linear,  L(v  —  s)  =  0.  Thus  v  —  s  is  in  kerL  =  U ,  which  is  contained  in  S 
by  assumption.  Then  .v  and  v  —  s  are  in  S,  and  hence  v  is  in  S.  We  conclude 
that  L~1(L(S))  C  S.  and  thus  passing  from  (a)  to  (b)  and  then  back  recovers  the 
vector  subspace  of  V  containing  U .  □ 

If  V  is  a  vector  space  and  Vi  and  1A  are  vector  subspaces,  then  we  write 
V\  +  Vi  for  the  set  Vi  +  V2  of  all  sums  v\  +  tn  with  v\  e  V\  and  vj  €  V2.  This 
is  again  a  vector  subspace  of  V  and  is  called  the  sum  of  Vi  and  V2.  If  we  have 
vector  subspaces  V\, ,  V„,  we  abbreviate  ((•  •  •  (Vi  +  V2)  +  V3)  +  •  •  •  +  Vn)  as 
Vi  +  ---  +  Vn. 


Theorem  2.28  (Second  Isomorphism  Theorem).  Let  M  and  N  be  vector 
subspaces  of  a  vector  space  V  over  IF.  Then  the  map  n  +  (M  D  N)  i->-  n  +  M  is 
a  well-defined  canonical  vector-space  isomorphism 

N/(M  n  N)  =  ( M  +  N)/M . 

PROOF.  The  function  L(n  +  (M C\N))  =  n  +  M  is  well  defined  since  M fl  N  C 
M ,  and  L  is  linear.  The  domain  of  L  is  { n  +  ( M  fl  N)  \  n  e  N).  and  the  kernel  is 
the  subset  of  this  where  n  lies  in  M  as  well  as  N .  For  this  to  happen,  n  must  be  in 
M  DN,  and  thus  the  kernel  is  the  0  element  of  N/(M  fl  N).  Hence  L  is  one-one. 

To  see  that  L  is  onto  (M  +  N)/M,  let  (m+n)  +  M  be  given.  Then  n  +  (M  fl N) 
maps  to  77  +  M,  which  equals  (m  +  77)  +  M.  Hence  L  is  onto.  □ 

Corollary  2.29.  Let  M  and  N  be  finite-dimensional  vector  subspaces  of  a 
vector  space  V  over  F.  Then 

dim (M  +  N)  +  dim(M  n  N)  =  dim  M  +  dim  N . 

PROOF.  Theorem  2.28  and  two  applications  of  Corollary  2.24a  yield 

dim(M  +  IV)  —  dim  M  =  dim  ((Af  +  N)/M) 

=  dim (N/(M  n  TV))  =  dim  TV  -  dim(M  n  N), 


and  the  result  follows. 


□ 


6.  Direct  Sums  and  Direct  Products  of  Vector  Spaces 

In  this  section  we  introduce  the  direct  sum  and  direct  product  of  two  or  more 
vector  spaces  over  F.  When  there  are  only  finitely  many  such  subspaces,  these 
constructions  come  to  the  same  thing,  and  we  call  it  "direct  sum.”  We  begin  with 
the  case  that  two  vector  spaces  are  given. 


6.  Direct  Sums  and  Direct  Products  of  Vector  Spaces 


59 


We  define  two  kinds  of  direct  sums.  The  external  direct  sum  of  two  vector 
spaces  V]  and  Vo  over  F,  written  Vi  ©  V2,  is  a  vector  space  obtained  as  follows. 
The  underlying  set  is  the  set-theoretic  product,  i.e.,  the  set  V\  x  V2  of  ordered 
pairs  (i>i,  ih)  with  v\  e  V\  and  t>2  £  V2.  The  operations  of  addition  and  scalar 
multiplication  are  defined  coordinate  by  coordinate: 

(Ml,  W2)  +  (Vl,  V2)  =  (Ml  +  V\,  U2  +  V2), 
c(v  1,  v2)  =  (cv  1,  cu2), 

and  it  is  immediate  that  Vj  ©  V2  satisfies  the  defining  properties  of  a  vector  space. 

If  {a, }  is  a  basis  of  Vi  and  {/?,  }  is  a  basis  of  V2,  then  it  follows  from  the  formula 
(t>i,  u2)  =  (iq ,  0)  +  (0,  t>2)  that  {(cii,  0)}  U  {(0,  bj)\  is  a  basis  of  Vi  ©  V2.  Con¬ 
sequently  if  Vi  and  V2  are  finite-dimensional,  then  Vi  ©  V2  is  finite-dimensional 
with 

dim(V]  ©  V2)  =  dim  Vi  +  dim  V2. 

Associated  to  the  construction  of  the  external  direct  sum  of  two  vector  spaces 
are  four  linear  maps  of  interest: 

two  “projections,”  p\  :  V-j  ©  V2  — »■  Vj 
P2  '■  Vi  ©  V2  — >  V2 
two  “injections,”  i \  :  Vi  ->  V\  ©  V2 
h  '■  V2  — »■  Vi  ©  V2 

These  have  the  properties  that 

(  /  on  V,  if  r  =  s, 

pris  =  { 

l  0  on  Vs  if  r  /  s, 
iiPi  +  hPi  =  /  on  V!  ©  V2. 

The  second  notion  of  direct  sum  captures  the  idea  of  recognizing  a  situation  as 
canonically  isomorphic  to  an  external  direct  sum.  This  is  based  on  the  following 
proposition. 

Proposition  2.30.  Let  V  be  a  vector  space  over  F,  and  let  Vi  and  V2  be  vector 
subspaces  of  V.  Then  the  following  conditions  are  equivalent: 

(a)  every  member  v  of  V  decomposes  uniquely  as  v  =  v  1  +  ih  with  iq  e  V\ 
and  ih  e  V2, 

(b)  Vj  +  V2  =  V  and  Vj  n  V2  =  0, 

(c)  the  function  from  the  external  direct  sum  Vi  ©  V2  to  V  given  by  ( v  1 .  u2) 
i>i  +  V2  is  an  isomorphism  of  vector  spaces. 


with  p\{v\,  V2)  =  V] , 
with  p2(v  1,  v2)  =  v2, 
with  ii(vi)  =  (vi,  0), 
with  i2(v2)  =  (0,  i>2). 
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Remarks. 

(1)  If  V  is  a  vector  space  with  vector  subspaces  V\  and  V2  satisfying  the 
equivalent  conditions  of  Proposition  2.30,  then  we  say  that  V  is  the  internal 
direct  sum  of  V\  and  V2.  It  is  customary  to  write  V  =  V\  ©  V2  in  this  case  even 
though  what  we  have  is  a  canonical  isomorphism  of  the  two  sides,  not  an  equality. 

(2)  The  dimension  formula 

dim(V]  ©  V2)  =  dim  V\  +  dim  V2 

for  an  internal  direct  sum  follows,  on  the  one  hand,  from  the  corresponding 
formula  for  external  direct  sums;  it  follows,  on  the  other  hand,  by  using  (b)  and 
Corollary  2.29. 

(3)  In  the  proposition  it  is  possible  to  establish  a  fourth  equivalent  condition  as 
follows:  there  exist  linear  maps  p\  :  V  — >  V,  P2  :  V  ►  V,  i\  :  image  p\  —*■  V, 
and  i 2  :  image  P2  — »■  V  such  that 

•  prisPs  equals  p,  if/-  =  s  and  equals  0  if  r  ^  s, 

•  hpi  +  hpi  =  I,  and 

•  Vj  =  image  i\p\  and  V2  =  image  A/C- 

PROOF.  If  (a)  holds,  then  the  existence  of  the  decomposition  v  =  iq  +  ih 
shows  that  V\  +  V2  =  V.  If  i;  is  in  V\  fl  V3-  then  0  =  v  +  (— v)  is  a  decomposition 
of  the  kind  in  (a),  and  the  uniqueness  forces  v  =  0.  Therefore  V\  Cl  V2  =  0.  This 
proves  (b). 

The  function  in  (c)  is  certainly  linear.  If  (b)  holds  and  v  is  given  in  V,  then 
the  identity  V\  +  V3  =  V  allows  us  to  decompose  v  as  u  =  V/  +  th.  This 
proves  that  the  linear  map  in  (c)  is  onto.  To  see  that  it  is  one-one,  suppose  that 
Vi  + 1>2  =  0.  Then  v\  =  —  m  shows  that  i>i  is  in  V\  fl  V2.  By  (b),  this  intersection 
is  0.  Therefore  v\  =  V2  =  0,  and  the  linear  map  in  (c)  is  one-one. 

If  (c)  holds,  then  the  fact  that  the  linear  map  in  (c)  is  onto  V  proves  the  existence 
of  the  decomposition  in  (a).  For  uniqueness,  suppose  that  v\  +  V2  =  «i  +  ui 
with  u\  and  v \  in  V\  and  with  U2  and  m  in  VT  Then  (u  \ ,  112)  and  (i>i,  ih)  have 
the  same  image  under  the  linear  map  in  (c).  Since  the  function  in  (c)  is  assumed 
one-one,  we  conclude  that  (u\,  112)  =  ( v  \ ,  1)2)-  This  proves  the  uniqueness  of  the 
decomposition  in  (a).  □ 

If  V  =  V\  ©  V2  is  a  direct  sum,  then  we  can  use  the  above  projections  and 
injections  to  pass  back  and  forth  between  linear  maps  with  Vj  and  V2  as  domain 
or  range  and  linear  maps  with  V  as  domain  or  range.  This  passage  back  and  forth 
is  called  the  universal  mapping  property  of  V\  ©  V2  and  will  be  seen  later  in  this 
section  to  characterize  Vj  ©  V2  up  to  canonical  isomorphism.  Let  us  be  specific 
about  how  this  property  works. 
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To  arrange  for  P  to  be  the  range,  suppose  that  U  is  a  vector  space  over  F  and 
that  L  |  :  U  — >  V\  and  Li  :  U  — >  V2  arc  linear  maps.  Then  we  can  define  a  linear 
map  L  :  U  — >  V  by  L  =  i\L\  +  i2L2,  i.e.,  by 


L(u)  =  (i\L\  +  i2L2)(u)  =  L2(u)), 

and  we  can  recover  L\  and  L2  from  L  by  L  \  =  p\  L  and  L2  =  p2L. 

To  arrange  for  P  to  be  the  domain,  suppose  that  W  is  a  vector  space  over  F 
and  that  M\  :  V\  — »■  W  and  M2  :  V2  — »■  W  are  linear  maps.  Then  we  can  define 
a  linear  map  M  :  V  —*■  W  by  M  =  M\p\  +  M2p2 ,  i.e.,  by 

M(v\,  v2)  =  M +  M2(v 2), 

and  we  can  recover  M\  and  M2  from  M  by  M \  =  Mi\  and  M2  =  Mi2. 

The  notion  of  direct  sum  readily  extends  to  the  direct  sum  of  n  vector  spaces 
over  F.  The  external  direct  sum  V\  ©  •  •  •  ©  Vn  is  the  set  of  ordered  pairs 
(t>i , . . . ,  vn)  with  each  vj  in  Vj  and  with  addition  and  scalar  multiplication  defined 
coordinate  by  coordinate.  In  the  finite-dimensional  case  we  have 

dim(Vi  ©  •  •  •  ©  Vn)  =  dim  V\  + - b  dim  Vn. 

If  Vi , ... ,  Vn  are  given  as  vector  subspaces  of  a  vector  space  V,  then  we  say 
that  V  is  the  internal  direct  sum  of  V\ , . . . ,  Vn  if  the  equivalent  conditions  of 
Proposition  2.31  below  are  satisfied.  In  this  case  we  write  V  =  V\  ©  •  •  •  ©  V„ 
even  though  once  again  we  really  have  a  canonical  isomorphism  rather  than  an 
equality. 

Proposition  2.31.  Let  V  be  a  vector  space  over  F,  and  let  Pi , ... ,  Vn  be  vector 
subspaces  of  V .  Then  the  following  conditions  are  equivalent: 

(a)  every  member  v  of  V  decomposes  uniquely  as  v  =  tq  +  •  •  •  +  v„  with 
Vj  e  Vj  for  1  <  j  <  n, 

(b)  Vx  +  ...  +  Vn  =  V  and  also  P/  fl  (Pj  +  •  •  •  +  Vj-\  +  Vj+\  +  •  •  •  +  Vn)  =  0 
for  each  j  with  !<./'<  n, 

(c)  the  function  from  the  external  direct  sum  Pi  ©  •  •  •  ©  P„  to  P  given  by 
(iq , . . . ,  v„)  i->  vi  +  •  •  •  +  v„  is  an  isomorphism  of  vector  spaces. 

Proposition  2.31  is  proved  in  the  same  way  as  Proposition  2.30,  and  the 
expected  analog  of  Remark  3  with  that  proposition  is  valid  as  well.  Notice 
that  the  second  condition  in  (b)  is  stronger  than  the  condition  that  P \  D  Vj  =  0  for 
all  i  j.  Figure  2.3  illustrates  how  the  condition  P,  n  P,  =  0  for  all  i  j  can 
be  satisfied  even  though  (b)  is  not  satisfied  and  even  though  the  vector  subspaces 
do  not  therefore  form  a  direct  sum. 
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Figure  2.3.  Three  1-dimensional  vector  subspaces  of  R2 
such  that  each  pair  has  intersection  0. 

If  V  =  Vj  ©  •  •  •  ©  Vn  is  a  direct  sum,  then  we  can  define  projections  p\.  ....  pn 
and  injections  i\, ...  ,in  in  the  expected  way,  and  we  again  get  a  universal  mapping 
property.  That  is,  we  can  pass  back  and  forth  between  linear  maps  with  Vj , . . . ,  Vn 
as  domain  or  range  and  linear  maps  with  V  as  domain  or  range.  The  argument 
given  above  for  n  =  2  is  easily  adjusted  to  handle  general  n,  and  we  omit  the 
details. 

To  generalize  the  above  notions  to  infinitely  many  vector  spaces,  there  are  two 
quite  different  ways  of  proceeding.  Let  us  treat  first  the  external  constructions. 
Let  a  nonempty  collection  of  vector  spaces  Va  over  F  be  given,  one  for  each  a  €  A. 
The  external  direct  sum  0ag  /1  Va  is  the  set  of  all  tuples  ( va }  in  the  Cartesian 
product  Xa6/1Va  with  all  but  finitely  many  va  equal  to  0  and  with  addition  and 
scalar  multiplication  defined  coordinate  by  coordinate.  For  this  construction  we 
obtain  a  basis  as  the  union  of  embedded  bases  of  the  constituent  spaces.  The 
external  direct  product  Y\<»g_a  K  is  the  set  of  all  tuples  (?’,/}  in  X  aeA  Vj, 
again  with  addition  and  scalar  multiplication  defined  coordinate  by  coordinate. 
When  there  are  only  finitely  many  factors  Vj , . . . ,  Vn ,  the  external  direct  product, 
which  manifestly  coincides  with  the  external  direct  sum,  is  sometimes  denoted 
by  Vj  x  •  •  •  x  Vn .  For  the  external  direct  product  when  there  are  infinitely  many 
factors,  there  is  no  evident  way  to  obtain  a  basis  of  the  product  from  bases  of  the 
constituents. 

The  projections  and  injections  that  we  defined  in  the  case  of  finitely  many 
vector  spaces  are  still  meaningful  here.  The  universal  mapping  property  is  still 
valid  as  well,  but  it  splinters  into  one  form  for  direct  sums  and  another  form  for 
direct  products.  The  formulas  given  above  for  using  linear  maps  with  the  V^'s 
as  domain  or  range  to  define  linear  maps  with  the  direct  sum  or  direct  product 
as  domain  or  range  may  involve  sums  with  infinitely  many  nonzero  terms,  and 
they  are  not  directly  usable.  Instead,  the  formulas  that  continue  to  make  sense 
are  the  ones  for  recovering  linear  maps  with  the  Va ’s  as  domain  or  range  from 
linear  maps  with  the  direct  sum  or  direct  product  as  domain  or  range.  These  turn 
out  to  determine  the  formulas  uniquely  for  the  linear  maps  with  the  direct  sum 
or  direct  product  as  domain  or  range.  In  other  words,  the  appropriate  universal 
mapping  property  uniquely  determines  the  direct  sum  or  direct  product  up  to  an 
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isomorphism  that  respects  the  relevant  projections  and  injections. 

Let  us  see  to  the  details.  We  denote  typical  members  of  a  ^  antl  ©„e  /1  Va 
by  {r>„}Q.S/i,  with  the  understanding  that  only  finitely  many  va  can  be  nonzero  in 
the  case  of  the  direct  sum.  The  formulas  are 

Pp  :  n  ^  VP 

as  A 

ir-Vp^Qva 

ae  A 

If  U  is  a  vector  space  over  F  and  if  a  linear  map  Lp  :  U  — >  Vp  is  given  for  each 
ft  e  A,  we  can  obtain  a  linear  map  L  :  U  — »■  ]""[  A  K-  that  satisfies  ppL  =  Lp 
for  all  ft.  The  definition  that  makes  perfectly  good  sense  is 

L(u)  —  {L(u)o.}a.gA  =  {LQ,(w)}0/gA- 

What  does  not  make  sense  is  to  try  to  express  the  right  side  in  terms  of  the 
injections  ia ;  we  cannot  write  the  right  side  as  ia  (Ta  («))  because  infinitely 
many  terms  might  be  nonzero. 

If  W  is  a  vector  space  and  a  linear  map  Mp  :  Vp  — >  W  is  given  for  each  ft,  we 
can  obtain  a  linear  map  M  :  ®„sA  V(J  — »■  W  that  satisfies  M ip  =  Mp  for  all  ft\ 
the  definition  that  makes  perfectly  good  sense  is 

M(ftva}a&A)  =  'y  ^  Mg{va). 

asA 

The  right  side  is  meaningful  since  only  finitely  many  va  can  be  nonzero.  It  can 
be  misleading  to  write  the  formula  as  M  =  ^ftaeA  Ma pa  because  infinitely  many 
of  the  linear  maps  Ma pa  can  be  nonzero  functions. 

In  any  event,  we  have  a  universal  mapping  property  in  both  cases — for  the  direct 
product  with  the  projections  in  place  and  for  the  direct  sum  with  the  injections 
in  place.  Let  us  see  that  these  universal  mapping  properties  characterize  direct 
products  and  direct  sums  up  to  an  isomorphism  respecting  the  projections  and 
injections,  and  that  they  allow  us  to  define  and  recognize  “internal”  direct  products 
and  direct  sums. 

A  direct  product  of  a  set  of  vector  spaces  Va  over  F  for  a  €  A  consists  of 
a  vector  space  V  and  a  system  of  linear  maps  :  V  Va  with  the  following 
universal  mapping  property:  whenever  U  is  a  vector  space  and  {La  \  is  a  system 
of  linear  maps  La  :  U  — >  Va,  then  there  exists  a  unique  linear  map  L  :  TJ  — V 
such  that  paL  =  La  for  all  a.  See  Figure  2.4.  The  external  direct  product 
establishes  existence  of  a  direct  product,  and  Proposition  2.32  below  establishes 
its  uniqueness  up  to  an  isomorphism  of  the  V’s  that  respects  the  p(J ’s.  A  direct 
product  is  said  to  be  internal  if  each  Va  is  a  vector  subspace  of  V  and  if  for  each 
a,  the  restriction  pa  I  is  the  identity  map  on  Va.  Because  of  the  uniqueness,  this 

I  Va 


with  pp({va}a&A)  =  vp, 

with  ipftvp)  =  {WoJasA  and  wa  = 


vp  if  a  =  ft, 
0  if  a  ^  ft. 
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definition  of  internal  direct  product  is  consistent  with  the  earlier  one  when  there 
are  only  finitely  Va 's. 


Va 


U 


V 


L 


FIGURE  2.4.  Universal  mapping  property  of  a  direct  product  of  vector  spaces. 


Proposition  2.32.  Let  A  be  a  nonempty  set  of  vector  spaces  over  F,  and  let 
Va  be  the  vector  space  corresponding  to  the  member  a  of  A.  If  (V,  {pa})  and 
(  V* ,  { p*})  arc  two  direct  products  of  the  Va  ’s,  then  the  linear  maps  pa  :  V  — »■  Va 
and  p*  :  V*  — »■  Va  are  onto  Va,  there  exists  a  unique  linear  map  L  :  V*  — >  V 
such  that  p*  =  paL  for  all  a  €  A,  and  L  is  invertible. 

Proof.  In  Figure  2.4  let  U  =  V*  and  La  =  p*.  If  L  :  V*  — >  V  is  the  linear 
map  produced  by  the  fact  that  V  is  a  direct  product,  then  we  have  paL  =  p*  for 
all  a.  Reversing  the  roles  of  V  and  V*,  we  obtain  a  linear  map  L*  :  V  — >■  V* 
with  p*L*  =  pa  for  all  a.  Therefore  pa(LL*)  =  (paL)L*  =  p*L*  =  pa. 

In  Figure  2.4  we  next  let  U  =  V  and  La  =  pa  for  all  a.  Then  the  identity 
1  v  on  V  has  the  same  property  pa\y  =  pa  relative  to  all  pa  that  LL*  has,  and 
the  uniqueness  says  that  LL*  =  1  y.  Reversing  the  roles  of  V  and  V *,  we  obtain 
L*L  =  1  yt.  Therefore  L  is  invertible. 

For  uniqueness  suppose  that  :  V*  — >■  V  is  another  linear  map  with  p*  = 
pa <5  for  all  a  e  A.  Then  the  argument  of  the  previous  paragraph  shows  that 
L*d>  =  1  y*.  Applying  L  on  the  left  gives  =  (LL*)<F  =  L(L*<t> )  =  Llv*  = 
L.  Thus  <5  =  L. 

Finally  we  have  to  show  that  the  cd11  map  of  a  direct  product  is  onto  Va.  It 
is  enough  to  show  that  p*  is  onto  Vu.  Taking  V  as  the  external  direct  product 
n„€  a  Kz  with  pa  equal  to  the  coordinate  mapping,  form  the  invertible  linear  map 
L*  :  V  — >  V*  that  has  just  been  proved  to  exist.  This  satisfies  pa  =  p*aL*  for  all 
a  e  A.  Since  pa  is  onto  Va,  p*  must  be  onto  Va  ■  □ 

A  direct  sum  of  a  set  of  vector  spaces  Va  over  F  for  a  €  A  consists  of  a  vector 
space  V  and  a  system  of  linear  maps  ia  :  Vu  V  with  the  following  universal 
mapping  property:  whenever  W  is  a  vector  space  and  {Ma}  is  a  system  of  linear 
maps  Ma  :  Va  — >  W ,  then  there  exists  a  unique  linear  map  M  :  V  — »■  W  such 
that  Mia  =  Ma  for  all  a.  See  Figure  2.5.  The  external  direct  sum  establishes 
existence  of  a  direct  sum,  and  Proposition  2.33  below  establishes  its  uniqueness 
up  to  isomorphism  of  the  V’s  that  respects  the  ia’s.  A  direct  sum  is  said  to  be 
internal  if  each  Va  is  a  vector  subspace  of  V  and  if  for  each  a,  the  map  ia  is  the 
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inclusion  map  of  Va  into  V.  Because  of  the  uniqueness,  this  definition  of  internal 
direct  sum  is  consistent  with  the  earlier  one  when  there  are  only  finitely  Va ’s. 


d  m 

V 

Figure  2.5.  Universal  mapping  property  of  a  direct  sum  of  vector  spaces. 


Proposition  2.33.  Let  A  be  a  nonempty  set  of  vector  spaces  over  F,  and  let 
Va  be  the  vector  space  corresponding  to  the  member  a  of  A.  If  (V,  {/„})  and 
( V* ,  { i* } )  are  two  direct  sums  of  the  Vf/s,  then  the  linear  maps  ia  \Va—*V  and 
i  *  :  Va  — »■  V*  are  one-one,  there  exists  a  unique  linear  map  M  :  V  — >  V*  such 
that  i*  =  Mia  for  all  a  £  A,  and  M  is  invertible. 

PROOF.  In  Figure  2.5  let  W  =  V*  and  Ma  =  i*.  If  M  :  V  —>■  V*  is  the  linear 
map  produced  by  the  fact  that  V  is  a  direct  sum,  then  we  have  Mia  =  i*  for  all 
a.  Reversing  the  roles  of  V  and  V*,  we  obtain  a  linear  map  M*  :  V*  V  with 
M*i*  =  ia  for  all  a.  Therefore  ( M*M)ia  =  M*i*  =  ia. 

In  Figure  2.5  we  next  let  W  =  V  and  Ma  =  ia  for  all  a.  Then  the  identity  1  y 
on  V  has  the  same  property  1  yia  =  ia  relative  to  all  ia  that  M*M  has,  and  the 
uniqueness  says  that  M*M  =  1  y.  Reversing  the  roles  of  V  and  V*,  we  obtain 
MM*  =  ly*.  Therefore  M  is  invertible. 

For  uniqueness  suppose  that  <F  :  V  — »■  V*  is  another  linear  map  with/*  =  <F/„ 
for  all  a  e  A.  Then  the  argument  of  the  previous  paragraph  shows  that  M*d>  = 
ly.  Applying  M  on  the  left  gives  =  (MM*)<1>  =  M(M*<i>)  =  Mly  =  M. 
Thus  =  M. 

Finally  we  have  to  show  that  the  ath  map  of  a  direct  sum  is  one-one  on  Va .  It 
is  enough  to  show  that  i*  is  one-one  on  Va.  Taking  V  as  the  external  direct  sum 
®  vsS  Va  with  ia  equal  to  the  embedding  mapping,  form  the  invertible  linear  map 
M*  :  V*  V  that  has  just  been  proved  to  exist.  This  satisfies  ia  =  M*i*  for  all 
a  e  A.  Since  ia  is  one-one,  i*  must  be  one-one.  □ 


7.  Determinants 


A  "determinant”  is  a  certain  scalar  attached  initially  to  any  square  matrix  and 
ultimately  to  any  linear  map  from  a  finite-dimensional  vector  space  into  itself. 
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The  definition  is  presumably  known  from  high-school  algebra  in  the  case  of 
2-by-2  and  3-by-3  matrices: 


det 


=  ad  —  be, 


=  aei  +  bfg  +  cdh  —  afh  —  bdi  —  ceg. 


For  n-by-n  square  matrices  the  determinant  function  will  have  the  following 
important  properties: 

(i)  det(Afi)  =  det  A  det  B, 

(ii)  det  /  =  1, 

(iii)  det  A  =  0  if  and  only  if  A  has  no  inverse. 


Once  we  have  constructed  the  determinant  function  with  these  properties,  we 
can  then  extend  the  function  to  be  defined  on  all  linear  maps  L  :  V  — »■  V  with  V 
finite-dimensional.  To  do  so,  we  let  T  be  any  ordered  basis  of  V,  and  we  define 


det  L  =  det 


L 

rr 


If  A  is  another  ordered  basis,  then 


det 


and  this  equals  det  ^  ^  J  by  (i)  since  ^  ^  J  and  ^  ^ ^  j  are  inverses  of  each 

other  and  since  their  determinants,  by  (i)  and  (ii),  are  reciprocals.  Hence  the 
definition  of  det  L  is  independent  of  the  choice  of  ordered  basis,  and  determinant 
is  well  defined  on  the  linear  map  L  :  V  — >  V.  It  is  then  immediate  that  the 
determinant  function  on  linear  maps  from  V  into  V  satisfies  (i),  (ii),  and  (iii) 
above. 

Thus  it  is  enough  to  establish  the  determinant  function  on  n-by-n  matrices. 
Setting  matters  up  in  a  useful  way  involves  at  least  one  subtle  step,  but  much  of 
this  step  has  fortunately  already  been  carried  out  in  the  discussion  of  signs  of 
permutations  in  Section  1.4.  To  proceed,  we  view  det  on  n-by-n  matrices  over 
F  as  a  function  of  the  n  rows  of  the  matrix,  rather  than  the  matrix  itself.  We 
write  V  for  the  vector  space  M\n  (F)  of  all  //-dimensional  row  vectors.  A  function 
/  :  V  x  —  •  x  V  — >  F  defined  on  ordered  /-tuples  of  members  of  V  is  called  a 
/-multilinear  functional  or  /-linear  functional  if  it  depends  linearly  on  each  of 
the  Z  vector  variables  when  the  other  Z  —  1  vector  variables  are  held  fixed.  For 
example. 


f((a  b),(c  d))  =  ac  +  b(c  +  d)  +  \ad 
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is  a  2-linear  functional  on  Mi  2  (IF)  x  M  12(F).  A  little  more  generally  and  more 
suggestively, 

g((a  b),(c  d))=l\(a  b)t2(c  d)  +  £j(a  b)l$(c  d) 

is  a  2-linear  functional  on  Mi 2 (IF)  x  Mi 2 (IF)  whenever  i\, . . . ,  1 4  are  linear 
functionals  on  Mi 2 (IF). 

Let  {ui, . . . ,  v„ }  be  a  basis  of  V.  Then  a  A-multilinear  functional  as  above 
is  determined  by  its  value  on  all  A-tuples  of  basis  vectors  (pq, . . . ,  u,( ) .  (Here 
f  1 , . . . ,  4  are  integers  between  1  and  n.)  The  reason  is  that  we  can  fix  all  but 
the  first  variable  and  expand  out  the  expression  by  linearity  so  that  only  a  basis 
vector  remains  in  each  term  for  the  first  variable;  for  each  resulting  term  we  can 
fix  all  but  the  second  variable  and  expand  out  the  expression  by  linearity;  and  so 
on.  Conversely  if  we  specify  arbitrary  scalars  for  the  values  on  each  such  A-tuple, 
then  we  can  define  a  ^-multilinear  functional  assuming  those  values  on  the  tuples 
of  basis  vectors. 

A  A-multilinear  functional  /  on  A-tuples  from  M\n  (IF)  is  said  to  be  alternating 
if  /  is  0  whenever  two  of  the  variables  are  equal. 

Example.  For  A  =  2and  n  =  2,  weuse  {iq  =  ( 1  0) ,  p2  =  (0  1 ) }  as  ba¬ 

sis.  Then  a  2-linear  multilinear  functional /is  determined  by  /(tq,  iq  ),/(iq,  v2), 
f(v2,  Pi),  and  f(v2,  p2).  If  /  is  alternating,  then  f(i q,  iq)  =  /( v2,  v2)  =  0. 
But  also  f(v  1  +  iq ,  Pi  +  P2)  =  0,  and  expansion  via  2-multilinearity  gives 

/( Pi,  Pi)  +  /( Pi,  P2)  +  /( v2.  Pi)  +  /( v2,  v2)  =  0. 

We  have  already  seen  that  the  first  and  last  terms  on  the  left  side  are  0,  and  thus 
/( P2,  Pi)  =  — /( Pi,  P2).  Therefore  /  is  completely  determined  by  /( Pi,  p2). 

The  principle  involved  in  the  computation  within  the  example  is  valid  more 
generally:  whenever  a  multilinear  functional  /  is  alternating  and  two  of  its 
arguments  are  interchanged,  then  the  value  of  /  is  multiplied  by  —  1 .  In  fact, 
let  us  suppress  all  variables  except  for  the  7  th  and  the  /lh.  Then  we  have 

0  =  f(v  +  W,  V  +  ip)  =  /( V  +  W,  p)  +  /( V  +  w,  w) 

=  /( P,  p)  +  f(w,  v)  +  f(v,  w)  +  f(w,  w)  =  /(IP,  p)  +  /( P,  ip). 

Theorem  2.34.  For  Mi„(F),  the  vector  space  of  alternating  77 -multilinear 
functionals  has  dimension  1,  and  a  nonzero  such  functional  has  nonzero  value  on 
(e[, . . . ,  e'n ) ,  where  {e\ . . . . ,  e„ }  is  the  standard  basis  of  IF".  Let  /o  be  the  unique 
such  alternating  77 -multilinear  functional  taking  the  value  1  on  (e[, ,  e'n ) .  If  a 
function  det  :  Mnn  (F)  -a-  IF  is  defined  by 


det  A  =  /o(Ai - -  A„.) 


68 


11.  Vector  Spaces  over  Q,  R,  and  € 


when  A  has  rows  A\. , . . . ,  A„. ,  then  det  has  the  properties  that 

(a)  det(Afi)  =  det A  det B, 

(b)  det  /  =  1, 

(c)  det  A  =  0  if  and  only  if  A  has  no  inverse, 

(d)  det  A  =  ^cr(sgncr)Ai(T(i)A2Cr(2)  •  •  •  Ana(„),  the  sum  being  taken  over  all 
permutations  <r  of  {1, 

PROOF  OF  UNIQUENESS.  Let  /  be  an  alternating  //-multilinear  functional,  and 
let  {it  i , . . . ,  un]  be  the  basis  of  the  space  of  row  vectors  defined  by  u,  =  e\ .  Since 
/  is  multilinear,  /  is  determined  by  its  values  on  all  //-tuples  (u^,  ■ . . ,  Uk„ )• 
Since  /  is  alternating,  fiu^,  .  .  . ,  M/t„)  =  0  unless  the  are  distinct,  i.e., 
unless  (uk, , . . . ,  Uk„)  is  of  the  form  (ua(i), . . . ,  n<T(«))  for  some  permutation 
a .  We  have  seen  that  the  value  of  /  on  an  //-tuple  of  rows  is  multiplied 
by  —1  if  two  of  the  rows  are  interchanged.  Corollary  1.22  and  Proposition 
1.24b  consequently  together  imply  that  the  value  of  /  on  an  //-tuple  is  multi¬ 
plied  by  sgn  cj  if  the  members  of  the  //-tuple  are  permuted  by  a .  Therefore 
. . . ,  Ua(n))  =  (sgn a)f{u\, . . . ,  un),  and  /  is  completely  determined 
by  its  value  on  (u\, . . . ,  un).  We  conclude  that  the  vector  space  of  alternating 
//-multilinear  functionals  has  dimension  at  most  1.  □ 

PROOF  OF  EXISTENCE.  Define  det  A,  and  therefore  also  /o,  by  (d).  Each  term 
in  this  definition  is  the  product  of  n  linear  functionals,  the  kth  linear  functional 
being  applied  to  the  kth  argument  of  /o,  and  /o  is  consequently  //-multilinear. 
To  see  that  /o  is  alternating,  suppose  that  the  /th  and  /lh  rows  are  equal  with 
i  ^  j.  If  r  is  the  transposition  of  i  and  j,  then  AlaT(l)A2aT(2)  •  •  •  A,Icrr(„)  = 
^itr(i)^2<r(2)  •  •  •  Ano(n),  and  Lemma  1.23  hence  shows  that 

(SgnCTrlAiorj-d)  A2orT(2)  ‘  *  *  AnoT^n^  T  (Sgnor)Aicr(d^2(T(2)  ‘  ‘  ‘  ^no(ti)  =  0. 

Thus  if  we  compute  the  sum  in  (d)  by  grouping  pairs  of  terms,  the  one  for  a  x  and 
the  one  for  a  if  sgn  a  =  + 1,  we  see  that  the  whole  sum  is  0.  Thus  /o  is  alternating. 
Finally  when  A  is  the  identity  matrix  /,  we  see  that  A^d  A2CT(2)  •  •  •  Ancr(n)  =  0 
unless  a  is  the  identity  permutation,  and  then  the  product  is  1 .  Since  sgn  1  =  + 1 , 
det/  =  +1.  We  conclude  that  the  vector  space  of  alternating  //-multilinear 
functionals  has  dimension  exactly  1 .  □ 

Proof  of  properties  of  det.  Fix  an  //-by-//  matrix  B.  Since  /o  is  alternating 
//-multilinear,  so  is  (iq, . . . ,  vn )  i->  f)(v\  B, . . . ,  vnB).  The  vector  space  of 
alternating  //-multilinear  functionals  has  been  proved  to  be  of  dimension  1,  and 
therefore  f0{v\B, . . . ,  v„B)  =  c(B)f0(v\,  ...,//„)  for  some  scalar  c(B).  In  the 
notation  with  det,  this  equation  reads  det(  A  B )  =  c( B)  det  A.  Putting  A  =  /,  we 
obtain  det  B  =  c(  B )  det  /.  Thus  c( B)  =  det  B ,  and  (a)  follows.  We  have  already 
proved  (b),  and  (d)  was  the  definition  of  det  A.  We  are  left  with  (c).  If  A-1 
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exists,  then  (a)  and  (b)  give  det(A-1)  det  A  =  det  7=1,  and  hence  det  A/0. 
If  A-1  does  not  exist,  then  Theorem  1.30  and  Proposition  1.27c  show  that  the 
reduced  row-echelon  form  R  of  A  has  a  row  of  0’s.  We  combine  Proposition  1 .29, 
conclusion  (a),  the  invertibility  of  elementary  matrices,  and  the  fact  that  invertible 
matrices  have  nonzero  determinant,  and  we  see  that  det  A  is  the  product  of  det  R 
and  a  nonzero  scalar.  Since  det  is  linear  as  a  function  of  each  row  and  since  R 
has  a  row  of  0’s,  det  R  =  0.  Therefore  det  A  =  0.  This  completes  the  proof  of 
the  theorem.  □ 


The  fast  procedure  for  evaluating  determinants  is  to  use  row  reduction,  keeping 
track  of  what  happens.  The  effect  of  each  kind  of  row  operation  on  a  determinant 
and  the  reasons  the  function  det  behaves  in  this  way  are  as  follows: 

(i)  Interchange  two  rows.  This  operation  multiplies  the  determinant  by  —  1 
because  of  the  alternating  property. 

(ii)  Multiply  a  row  by  a  nonzero  scalar  c.  This  operation  multiplies  the 
determinant  by  c  because  of  the  linearity  of  determinant  as  a  function  of 
that  row. 

(iii)  Replace  the  ith  row  by  the  sum  of  it  and  a  multiple  of  the  jth  row  with 
j  7^  This  operation  leaves  the  determinant  unchanged.  In  fact,  the 
matrix  whose  ith  row  is  replaced  by  the  jth  row  has  determinant  0  by  the 
alternating  property,  and  the  rest  follows  by  linearity  in  the  ith  row. 

As  with  row  reduction  the  number  of  steps  required  to  compute  a  determinant 
this  way  is  <  Cn3  in  the  n-by-n  case. 

A  certain  savings  of  computation  is  possible  as  compared  with  full-fledged 
row  reduction.  Namely,  we  have  only  to  arrange  for  the  reduced  matrix  to  be  0 
below  the  main  diagonal,  and  then  the  determinant  of  the  reduced  matrix  will 
be  the  product  of  the  diagonal  entries,  by  inspection  of  the  formula  in  Theorem 
2.34d. 


/I  2  3 

Example.  For  the  matrix  I  4  5  6 

\7  8  10 


we  have 


/ 1 

2 

3  \ 

/l 

2 

det  I  4 

5 

6  J 

|  (=  det  (  0 

-3 

\7 

8 

10/ 

Vo 

-6 

/I 

2 

3  \ 

Z1 

2 

3  \ 

=  -3  det  [  0 

1 

2  ] 

1  (=  -3  det  (  0 

1 

2  =  -3 

Vo 

-6 

-11 ) 

Vo 

0 

if 

We  conclude  this  section  with  a  number  of  formulas  for  determinants. 
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Proposition  2.35.  If  A  is  an  n-by-n  square  matrix,  then  det  A 1  =  det  A. 

PROOF.  Corollary  2.9  says  that  the  row  space  and  the  column  space  of  A  have 
the  same  dimension,  and  A  is  invertible  if  and  only  if  the  row  space  has  dimension 
n.  Thus  A  is  invertible  if  and  only  if  A'  is  invertible,  and  Theorem  2.34c  thus 
shows  that  det  A  =  0  if  and  only  if  det  A'  =  0.  Now  suppose  that  det  A  and  det  A' 
are  nonzero.  Then  we  can  write  A  =  E\  ■  ■  ■  Er  with  each  Ej  an  elementary  matrix 
of  one  of  the  three  types.  Theorem  2.34a  shows  that  det  A  =  n./=i  det  Ej  and 
det  A'  =  n'/=i  det  Ej  >  and  hence  it  is  enough  to  prove  that  det  Ej  =  det  Ej  for 
each  j.  For  Ej  of  either  of  the  first  two  types,  Ej  =  Ej  and  there  is  nothing  to 
prove.  For  Ej  of  the  third  type,  we  have  det  Ej  =  det  Ej  =  1 .  The  result  follows. 

□ 

Proposition  2.36  (expansion  in  cofactors).  Let  A  be  an  n-by-n  matrix,  and  let 
Ajj  be  the  square  matrix  of  size  n  —  1  obtained  by  deleting  the  1th  row  and  the  jth 
column.  Then 

(a)  forany  j,detA  =  (—  I  )'+7  A(/  det  4,/,  i.e.,  det  A  may  be  calculated 

by  ‘‘expansion  in  cofactors”  about  the  jth  column, 

(b)  forany;, detA  =  Yl'j=i  (—l)!+JA,j  det  A,j,  i.e.,  det  A  may  be  calculated 
by  “expansion  in  cofactors”  about  the  ;th  row. 

Remarks.  If  this  formula  is  iterated,  we  obtain  a  procedure  for  evaluating  a 
determinant  in  about  Cn !  steps.  This  procedure  amounts  to  using  the  formula  for 
det  A  in  Theorem  2.34d  and  is  ordinarily  not  of  practical  use.  However,  it  is  of 
theoretical  use,  and  Corollary  2.37  will  provide  a  simple  example  of  a  theoretical 
application. 

PROOF.  It  is  enough  to  prove  (a)  since  (b)  then  follows  by  combining  (a)  and 
Proposition  2.35.  In  (a),  the  right  side  is  1  when  A  =  /,  and  it  is  enough  by 
Theorem  2.34  to  prove  that  the  right  side  is  alternating  and  n -multilinear.  Each 
term  on  the  right  side  is  n -multilinear,  and  hence  so  is  the  whole  expression.  To 
see  that  the  right  side  is  alternating,  suppose  that  the  £th  and  /lh  rows  are  equal 
with  k  <  I.  The  kth  and  /th  rows  are  both  present  in  A(/  if  i  is  not  equal  to  k  or  /, 
and  thus  each  det  Ajj  is  0  for ;  not  equal  to  k  or  /.  We  are  left  with  showing  that 

(-l)*+''Aiy  det  A^  +  (-1)/+/A,/  det  TXj  =  0. 

The  two  matrices  A*/  and  A/;  have  the  same  rows  but  in  a  different  order.  The 
order  is 

l, ...  ,k  —  1,^+1, ...,/  —  1 ,  /,  /  +  1 , ...,?;  in  the  case  of  A/y , 
l, ...  ,k  —  l,k,  £+1, —  1, /  +  in  the  case  of  A/y . 
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We  can  transform  the  first  matrix  into  the  second  by  transposing  the  index  for 
row  /  to  the  left  one  step  at  a  time  until  it  gets  to  the  kth  position.  The  number  of 
steps  is  /  —  k  —  1,  and  therefore  det  A/;  =  (—  1)/_A~1  dct  A^j.  Consequently 

(— 1) k+jAkj  det  A^j  +  (-1  )l+j  A, j  det  A^j 

=  (i (-l)k+jAkj  +  Aij)  det 

The  right  side  is  0  since  A*,-  =  A/j,  and  the  proof  is  complete.  □ 


Corollary  2.37  (Vandermonde  matrix  and  determinant).  If  are 

scalars,  then 


/  1 


det 


r  l 


!l—  1 


?'2 

r- 

r2 


n—  1 


n  <*-")■ 


PROOF.  We  show  that  the  determinant  is 


n  (rj  _ri)det 

j>  i 


/ 


i 

r 2 

n— 2 


1  \ 


v.n—2 


and  then  the  result  follows  by  induction.  In  the  given  matrix,  replace  the  nth  row 
by  the  sum  of  it  and  —r\  times  the  (n  —  l)st  row,  then  the  (n  —  l)st  row  by  the 
sum  of  it  and  —  ri  times  the  (n  —  2)nd  row,  and  so  on.  The  resulting  determinant 
is 


(l  1 

0  r2  -  r  i 

0  r'!“2  -  r^”-3 

VO  rn2-'  -nrr2 


1  \ 

rn—r  i 


r 


n— 2 
n 


r 


n  —  1 
n 


n  — 

~rK 

n  — 

~rK 


=  det 


r 2  ~  r\ 


<-2 
n—  1 


-nr  2 

r\r'2 


r„  ~  r  i  \ 


rn— 2 


r 


72—1 

n 


-rxr 
—  rxr 


ri¬ 

ll 

ti- 

ri 


U 


by  Proposition  2.36a 
applied  with  j  =  1 
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=  (72  -  n) 


r\)  det  ( 


1 

n 


l\ 


vr2  ••• 


the  last  step  following  by  multilinearity  of  the  determinant  in  the  columns  (as  a 
consequence  of  Proposition  2.35  and  multilinearity  in  the  rows).  □ 


The  classical  adjoint  of  the  square  matrix  A,  denoted  by  Aadj ,  is  the  matrix  with 
entries  A|'‘lj  =  (—  1  )I+J  det  A;;-  with  An  defined  as  in  the  statement  of  Proposition 
2.36:  An  is  the  matrix  A  with  the  kth  row  and  /th  column  deleted. 


In  the  2-by-2  case,  we  have 


adj 


d  —b 
—c  a 


Thus  we  have 


AAadi  =  Aadi  A  =  (det  A) I  in  the  2-by-2  case.  Cramer’s  rule  for  solving  simul¬ 
taneous  linear  equations  results  from  the  77-by-77  generalization  of  this  formula. 


Proposition  2.38  (Cramer’s  rule).  If  A  is  an  ?z-by-77  matrix,  then  AAadj  = 
AadiA  =  (det  A)/,  and  thus  det  A  ^  0  implies  A-1  =  (det  A)-1  Aadi.  Conse¬ 
quently  if  det  A  ^  0,  then  the  unique  solution  of  the  simultaneous  system  Ax  =  b 

/*i\  /M 

of  77  equations  in  n  unknowns,  in  which  x  =  I  j 

\x, 

det  B; 

1  det  A 

with  Bj  equal  to  the  /7 -by-??  matrix  obtained  from  A  by  replacing  the  /th  column 
of  A  by  b. 

Remarks.  If  we  think  of  the  calculation  of  the  determinant  of  an  77-by-?z  matrix 
as  requiring  about  tz3  steps,  then  application  of  Cramer’s  rule,  at  least  if  done  in 
an  unthinking  fashion,  suggests  that  solving  an  invertible  system  requires  about 
77 3  (77  +  1)  steps,  i.e.,  zz  +  1  determinants  are  involved  in  the  explicit  solution.  Use 
of  row  reduction  directly  to  solve  the  system  is  more  efficient  than  proceeding  this 
way.  Thus  Cramer’s  rule  is  more  important  for  its  theoretical  applications  than  it 
is  for  making  computations.  One  simple  theoretical  application  is  the  observation 
that  each  entry  of  the  inverse  of  a  matrix  is  the  quotient  of  a  polynomial  function 
of  the  entries  divided  by  the  determinant. 

Proof.  The  (z,  j ) dl  entry  of  /\adj  A  is 

(AadjA)!7  =  J2  -^'A*,  =  J2  (—  V)i+k (det  Aki )  Ay . 

k— I  *= 1 


and  b  = 


,  has 
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If  i  =  j,  then  expansion  in  cofactors  about  the  jth  column  (Proposition  2.36a) 
identifies  the  right  side  as  det  A.  If  i  ^  j,  consider  the  matrix  B  obtained  from  A 
by  replacing  the  /th  column  of  A  by  the  jth  column.  Then  the  ;th  and  jth  columns 
of  B  are  equal,  and  hence  det  B  =  0.  Expanding  det  B  in  cofactors  about  the  ;lh 
column  (Proposition  2.36a),  we  obtain 


0  =  det  B  =  J^(-])i+k(det  Bkl)Bkl  =  J]  (-l)'+*(det  Aki)Akj. 

k=  1  k=  1 

Thus  A  A adj  =  (det  A)/.  A  similar  argument  proves  that  /\adj  A  =  (det  A) I . 

For  the  application  to  Ax  =  b ,  we  multiply  both  sides  on  the  left  by  /\adj  and 
obtain  (det  A)x  =  /tadj/?.  Hence 


(det  A)xj  =  ^  (Aadi)jibj  =  ^  (-1  ),+7b,-  det  A,-,-, 

i = 1  i= 1 

and  the  right  side  equals  det  Bf  by  expansion  in  cofactors  of  det  Bt  about  the  jth 
column  (Proposition  2.36a).  □ 


8.  Eigenvectors  and  Characteristic  Polynomials 

A  vector  v  ^  0  in  F"  is  an  eigenvector  of  the  n-by-n  matrix  A  if  An  =  Xv 
for  some  scalar  X.  We  call  X  the  eigenvalue  associated  with  n.  When  X  is  an 
eigenvalue,  the  vector  space  of  all  v  with  An  =  Xv,  i.e.,  the  set  consisting  of  the 
eigenvectors  and  the  0  vector,  is  called  the  eigenspace  for  X. 

If  we  think  of  A  as  giving  a  linear  map  L  from  IF"  to  itself,  an  eigenvector  takes 
on  geometric  significance  as  a  vector  mapped  to  a  multiple  of  itself  by  L .  Another 
geometric  way  of  viewing  matters  is  that  the  eigenvector  yields  a  1 -dimensional 
subspace  U  =  Fv  that  is  invariant,  or  stable,  under  L  in  the  sense  of  satisfying 
L(U)  c  U. 


Proposition  2.39.  An  n-by-n  matrix  A  has  an  eigenvector  with  eigenvalue  X 
if  and  only  if  det(A/  —  A)  =  0.  In  this  case  the  eigenspace  for  X  is  the  kernel  of 
XI  -A. 

PROOF.  We  have  Av  =  Xv  if  and  only  if  {XI  —  A) v  =  0,  if  and  only  if  v  is  in 
ker(7,/  —  A).  This  kernel  is  nonzero  if  and  only  if  det (7,/  —  A)  =0.  □ 
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With  A  fixed,  the  expression  det (A/  —  A)  is  a  polynomial  in  A  of  degree  n 
and  is  called  the  characteristic  polynomial8  of  A.  To  see  that  it  is  at  least  a 
polynomial  function  of  A,  let  us  expand  det(A7  —  A)  as 


det 


/A  —  An 

I  —^21 


—  ^12 
k.  —  A22 


Ain  \ 

A-2  n  I 


V  An\ 


An2 


A  Ann  / 


=  ^  (sgnoOterm,.^!)  •  •  •  term,,^) . 

a 


The  term  for  the  permutation  a  =  1  has  cr(k)  =  k  for  every  k  and  gives 
[~["=1  (A  —  Ajj ) .  All  other  er’s  have  o{k)  =  k  for  at  most  n  —  2  values  of  k , 
and  A  therefore  occurs  at  most  n  —  2  times.  Thus  the  above  expression  is 


—  n  ~  A/';')  + 
7=1 


|  other  terms  with  powers 
1  of  A  at  most  n  —  2 


=  A 


n 


|  terms  with  powers  ofl  +  det  A 
Ia  from  n  —  2  to  1  I 


The  constant  term  is  (—1)"  det  A  as  indicated  because  it  is  the  value  of  the  poly¬ 
nomial  at  A  =  0,  which  is  det(  — A).  In  any  event,  we  now  see  that  characteristic 
polynomials  are  polynomial  functions.  Starting  in  Chapter  V,  we  shall  treat  them 
as  polynomials  in  one  indeterminate  in  the  sense9  of  Section  1.3;  for  now,  we  are 
calling  the  indeterminate  A,  but  later  as  our  point  of  view  evolves,  we  shall  start 
calling  it  X.  The  negative  of  the  coefficient  of  A"-1  is  the  trace  of  A,  denoted 
by  Tr  A.  Thus  Tr  A  =  Ylj=i  Ajj-  Trace  is  a  linear  functional  on  the  vector  space 
Mnn(¥ )  of  77-by-«  matrices. 

/  4  1  \ 

Example  1 .  For  A  =  I  1 ,  the  characteristic  polynomial  is 

det(A/-A)  =  det(A“4  ^ 

=  (A  -  4) (A  -  1)  +  2  =  A2  -  5A  +  6  =  (A  -  2) (A  -  3). 


8Some  authors  call  det(A  —  XI)  the  characteristic  polynomial.  This  is  the  same  polynomial  as 
det(L/  —  A)  if  n  is  even  and  is  the  negative  of  it  if  n  is  odd.  The  choice  made  here  has  the  slight 
advantage  of  always  having  leading  coefficient  1,  which  is  a  handy  property  in  some  situations. 

9In  Chapter  V  we  will  allow  determinants  of  matrices  whose  entries  are  from  any  “commutative 
ring  with  identity,”  C[7]  being  an  example.  Then  we  can  think  of  det(A./  —  A)  directly  as  involving 
an  indeterminate  X  and  not  initially  as  a  function  of  a  scalar  X. 
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The  roots,  and  hence  the  eigenvalues,  are  A  =  2  and  A  =  3.  The  eigenvectors  for 
A  =  2  are  computed  by  solving  (2 1  —  A)v  =  0.  The  method  of  row  reduction 
gives 


/  2-4  -1  0 
^  2  2-1  0 


-2  -1 

2  1 


0 

\  / 

'  l 

1 

0 

M 

2 

V  0 

0 

Thus  we  have  xi  +  \x2  =  0  and  xi  =  —\x2-  So  the  eigenvectors  for  A  =  2 


are  the  nonzero  vectors  of  the  form 


=  x2 


-  ).  Similarly  we  find 


the  eigenvectors  for  A  =  3  by  starting  from  (3/  —  A)v  =  0  and  solving.  The 
result  is  that  the  eigenvectors  for  A  =  3  are  the  nonzero  vectors  of  the  form 


Xl 

X2 


=  X2 


-1 

1 


For  this  example,  there  is  a  basis  of  eigenvectors. 


Corollary  2.40.  An  n-by-n  matrix  A  has  at  most  n  eigenvalues. 

PROOF.  Since  det(A7  —  A)  is  a  polynomial  of  degree  n,  this  follows  from 
Proposition  2.39  and  Corollary  1.14.  □ 


It  will  later  be  of  interest  that  certain  matrices  A  have  a  basis  of  eigenvectors. 
Such  a  basis  exists  for  A  as  in  Example  1  but  not  in  general.  One  thing  that 
can  prevent  a  matrix  from  having  a  basis  of  eigenvectors  is  the  failure  of  the 
characteristic  polynomial  to  factor  into  first-degree  factors.  Thus,  for  example, 

A  =  ^  ^  ^  has  characteristic  polynomial  A2  +  1 ,  which  does  not  factor 

into  first-degree  factors  when  F  =  M.  Even  when  we  do  have  a  factorization 
into  first-degree  factors,  we  can  still  fail  to  have  a  basis  of  eigenvectors,  as  the 
following  example  shows. 


Example  2.  For  A  = 
by  det(A7  —  A)  =  det 
eigenvectors,  we  get 


0 

A  -  1 
0 


1  -1 
1 

1 

A  -  1 


,  the  characteristic  polynomial  is  given 
=  (A  —  l)2.  When  we  solve  for 


0  1 
0  0 


,  and  X2  =  0.  Thus 


and  we  do  not  have  a  basis  of  eigenvectors. 


=  X\ 


What  happens  is  that  the  presence  of  a  factor  (A  —  c)k  in  the  characteristic 
polynomial  ensures  the  existence  of  an  r  -parameter  family  of  eigenvectors  for 
eigenvalue  c,  with  1  <  r  <  k,  but  not  necessarily  with  r  =  k.  Example  2  shows 
that  r  can  be  strictly  less  than  k.  For  purposes  of  deciding  whether  there  is  a  basis 
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of  eigenvectors,  the  positive  result  is  that  the  different  roots  of  the  characteristic 
polynomial  do  not  interfere  with  each  other;  this  is  a  consequence  of  the  following 
proposition. 

Proposition  2.41.  If  A  is  an  n-by-n  matrix,  then  eigenvectors  for  distinct 
eigenvalues  are  linearly  independent. 

Remark.  It  follows  that  if  the  characteristic  polynomial  of  A  has  n  distinct 
eigenvalues,  then  it  has  a  basis  of  eigenvectors. 

Proof.  Let  Av\  =  A.  i  iq ,  . . . ,  Avk  =  Xk vk  with  Xi, . . . ,  Xk  distinct,  and 
suppose  that 

<qiq  H - 1-  ckvk  =  0. 

Applying  A  repeatedly  gives 


ciAim  +  •  •  •  +  ckXkvk  —  0, 
c\X\v\  +  •  •  •  +  ckXkvk  =  0, 


cxX\  1  tq  ckXk  1  v*  —  0 . 


If  the  7  th  entry  of  v,  is  denoted  by  v)'1' ,  this  system  of  vector  equations  says  that 


/  1 

A, 

U?"1 


1  \ 

h 


4-1/ 


C\V 


O'). 


^Ckvk 


0) 


for  1  <  /  <  n. 


The  square  matrix  on  the  left  side  is  a  Vandermonde  matrix,  which  is  invertible 
by  Corollary  2.37  since  X\, ...  ,Xk  are  distinct.  Therefore  =  0  for  all  i 
and  j.  Each  u,  is  nonzero  in  some  entry  v(/]  with  j  perhaps  depending  on  i,  and 
hence  c,-  =  0.  Since  all  the  coefficients  c,  have  to  be  0,  iq, . . . ,  vk  are  linearly 
independent.  □ 


The  theory  of  eigenvectors  and  eigenvalues  for  square  matrices  allows  us  to 
develop  a  corresponding  theory  for  linear  maps  L  ;  V  —>■  V,  where  V  is  an 
n -dimensional  vector  space  over  F.  If  L  is  such  a  function,  a  vector  v  ^  0 
in  V  is  an  eigenvector  of  L  if  L(v )  =  7. v  for  some  scalar  X.  We  call  X  the 
eigenvalue.  When  X  is  an  eigenvalue,  the  vector  space  of  all  v  with  L ( v )  =  Xv 
is  called  the  eigenspace  for  X  under  L.  We  can  compute  the  eigenvalues  and 
eigenvectors  of  L  by  working  in  any  ordered  basis  T  of  V .  The  equation  L(v)  = 
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Xv  becomes 


vector 


L 

rr 


=  x 


and  is  satisfied  if  and  only  if  the  column 


is  an  eigenvector  of  the  matrix  A  = 


L 

rr 


with  eigenvalue  X. 


Applying  Proposition  2.39  and  remembering  that  determinants  are  well  defined 
on  linear  maps  L  :  V  — »■  V,  we  see  that  L  has  an  eigenvector  with  eigenvalue  X 
if  and  only  if  det(A7  —  L)  =  0  and  that  in  this  case  the  eigenspace  is  the  kernel 
of  XI  -  L. 

What  happens  if  we  make  these  computations  in  a  different  ordered  basis  A? 
We  know  from  Proposition  2.17  that  the  matrices  A  =  and  B  =  ^ 

Computing  with 


are  similar,  related  by  B  =  C  'AC,  where  C  = 


vrr 

i 

rA 


A  leads  to  u  = 


as  eigenvector  for  the  eigenvalue  X.  The  corresponding 


result  for  B  is  that  B(C  lu 


=  C“'ACC“'w  =  C-'Am  =  XC~hi. 


Thus 


C~hi  = 


is  an  eigenvector  of  B  with  eigenvalue  X,  just 


Ar)(rj  ^  a 

as  it  should  be. 

These  considerations  about  eigenvalues  suggest  some  facts  about  similar  ma¬ 
trices  that  we  can  observe  more  directly  without  first  passing  from  matrices  to 
linear  maps:  One  is  that  similar  matrices  have  the  same  characteristic  polynomial. 
To  see  this,  suppose  that  B  =  C-1  AC;  then 


det(A7  -  B)  =  det {XI  -  C_1  AC)  =  det(C_1(A/  -  A)C) 

=  (detC-1)  det(A/  -  A)(detC“') 

=  (detC_1)(detC_1)  det(A.7  -  A)  =  det(A7  -  A). 


A  second  fact  is  that  similar  matrices  have  the  same  trace.  In  fact,  the  trace  is 
the  negative  of  the  coefficient  of  Xn~l  in  the  characteristic  polynomial,  and  the 
characteristic  polynomials  are  the  same. 

Because  of  these  considerations  we  are  free  in  the  future  to  speak  of  the  char¬ 
acteristic  polynomial,  the  eigenvalues,  and  the  trace  of  a  linear  map  from  a  finite¬ 
dimensional  vector  space  to  itself,  as  well  as  the  determinant,  and  these  notions 
do  not  depend  on  any  choice  of  ordered  basis.  We  can  speak  unambiguously  also 
of  the  eigenvectors  of  such  a  linear  map.  For  this  notion  the  realization  of  the 
eigenvectors  in  an  ordered  basis  as  column  vectors  depends  on  the  ordered  basis, 
the  dependence  being  given  by  the  formulas  two  paragraphs  before  the  present 
one. 

One  final  remark  is  in  order.  When  the  scalars  are  taken  to  be  the  complex 
numbers  C,  the  Fundamental  Theorem  of  Algebra  (Theorem  1.18)  is  applicable: 
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every  polynomial  of  degree  >  1  has  at  least  one  root.  When  applied  to  the  char¬ 
acteristic  polynomial  of  a  square  matrix  or  a  linear  map  from  a  finite-dimensional 
vector  space  to  itself,  this  theorem  tells  us  that  the  matrix  or  linear  map  always 
has  at  least  one  eigenvalue,  hence  an  eigenvector.  We  shall  make  serious  use  of 
this  fact  in  Chapter  III. 


9.  Bases  in  the  Infinite-Dimensional  Case 

So  far  in  this  chapter,  the  use  of  bases  has  been  limited  largely  to  vector  spaces 
having  a  finite  spanning  set.  In  this  case  we  know  from  Corollary  2.3  that  the 
finite  spanning  set  has  a  subset  that  is  a  basis,  any  linearly  independent  set  can  be 
extended  to  a  basis,  and  any  two  bases  have  the  same  finite  number  of  elements. 
We  called  such  spaces  finite-dimensional  and  defined  the  dimension  of  the  vector 
space  to  be  the  number  of  elements  in  a  basis. 

The  first  objective  in  this  section  is  to  prove  analogs  of  these  results  in  the 
infinite-dimensional  case.  We  shall  make  use  of  Zorn’s  Lemma  as  in  Section  A5 
of  the  appendix,  as  well  as  the  notion  of  cardinality  discussed  in  Section  A6  of  the 
appendix.  Once  these  analogs  are  in  place,  we  shall  examine  the  various  results 
that  we  proved  about  finite-dimensional  spaces  to  see  the  extent  to  which  they 
remain  valid  for  infinite-dimensional  spaces. 

Theorem  2.42.  If  V  is  any  vector  space  over  F,  then 

(a)  any  spanning  set  in  V  has  a  subset  that  is  a  basis, 

(b)  any  linearly  independent  set  in  V  can  be  extended  to  a  basis, 

(c)  V  has  a  basis, 

(d)  any  two  bases  have  the  same  cardinality. 

Remarks.  The  common  cardinality  mentioned  in  (d)  is  called  the  dimension 
of  the  vector  space  V .  In  many  applications  it  is  enough  to  use  +oo  in  place  of 
each  infinite  cardinal  in  dimension  formulas.  This  was  the  attitude  conveyed  in 
the  remark  with  Corollary  2.24. 

PROOF.  For  (b),  let  E  be  the  given  linearly  independent  set,  and  let  S  be  the 
collection  of  all  linearly  independent  subsets  of  V  that  contain  E.  Partially  order 
S  by  inclusion  upward.  The  set  S  is  nonempty  because  E  is  in  S.  Let  T  be  a 
chain  in  <S,  and  let  A  be  the  union  of  the  members  of  T.  We  show  that  A  is  in 
S ,  and  then  A  is  certainly  an  upper  bound  of  T.  Because  of  its  definition,  A 
contains  E ,  and  we  are  to  prove  that  A  is  linearly  independent.  For  A  to  fail  to 
be  linearly  independent  would  mean  that  there  are  vectors  v\, . . . ,  vn  in  A  with 
c\V\  +  ■  ■  ■  +  c„vn  =0  for  some  system  of  scalars  not  all  0.  Let  Vj  be  in  the 


9.  Bases  in  the  Infinite-Dimensional  Case 


19 


member  Aj  of  the  chain  T.  Since  A  \  C  A 2  or  A 2  C  A\,  V\  and  m  are  both  in 
A 1  or  both  in  A2.  To  keep  the  notation  neutral,  say  they  are  both  in  A'2.  Since 
A'2  C  A3  or  A3  C  A2,  all  of  V\ ,  V2,  U3  are  in  A2  or  they  are  all  in  A3.  Say  they 
are  all  in  A3.  Continuing  in  this  way,  we  arrive  at  one  of  the  sets  A\, . . . ,  A„, 
say  A'n,  such  that  all  of  v\, . . . ,  v„  are  all  in  A'n.  The  members  of  A'n  are  linearly 
independent  by  assumption,  and  we  obtain  the  contradiction  c\  =•••  =  <;„=  0. 
We  conclude  that  A  is  linearly  independent.  Thus  the  chain  T  has  an  upper  bound 
in  S.  By  Zorn’s  Lemma,  S  has  a  maximal  element,  say  M.  By  Proposition  2.1a, 
M  is  a  basis  of  V  containing  E. 

For  (a),  let  E  be  the  given  spanning  set,  and  let  S  be  the  collection  of  all 
linearly  independent  subsets  of  V  that  are  contained  in  E.  Partially  order  S  by 
inclusion  upward.  The  set  S  is  nonempty  because  0  is  in  S.  Let  T  be  a  chain  in 
S ,  and  let  A  be  the  union  of  the  members  of  T.  We  show  that  A  is  in  S.  and  then 
A  is  certainly  an  upper  bound  of  T.  Because  of  its  definition,  A  is  contained  in 
E,  and  the  same  argument  as  in  the  previous  paragraph  shows  that  A  is  linearly 
independent.  Thus  the  chain  T  has  an  upper  bound  in  S.  By  Zorn’s  Lemma,  S 
has  a  maximal  element,  say  M.  Proposition  2. 1  a  is  not  applicable,  but  its  proof  is 
easily  adjusted  to  apply  here  to  show  that  M  spans  V  and  hence  is  a  basis:  Given 
v  in  V,  we  are  to  prove  that  v  lies  is  the  linear  span  of  M.  First  suppose  that  v 
is  in  E.  If  v  is  in  M,  there  is  nothing  to  prove.  Since  M  U  { v }  is  contained  in 
E,  the  assumed  maximality  implies  that  M  U  { v }  is  not  linearly  independent,  and 
hence  cv  +  c\V\  +  •  •  •  +  cnvn  =  0  for  some  scalars  c,  c\, . . . ,  c„  not  all  0  and 
for  some  vectors  v\ , . . . ,  vn  in  M.  The  scalar  c  cannot  be  0  since  M  is  linearly 
independent.  Thus  v  =  — c_1cii>i  —  •  •  •  —  1  cn  v„ ,  and  v  is  exhibited  as  in  the 
linear  span  of  M.  Consequently  every  member  of  E  lies  in  the  linear  span  of  M. 
Now  suppose  that  v  is  not  in  E.  Since  every  member  of  V  lies  in  the  linear  span 
of  E,  every  member  of  V  lies  in  the  linear  span  of  M. 

Conclusion  (c)  follows  from  (a)  by  taking  the  spanning  set  to  be  V ;  alternatively 
it  follows  from  (b)  by  taking  the  linearly  independent  set  to  be  0. 

For  (d),  let  A  =  {n„}  and  B  =  {vup}  be  two  bases  of  V .  Each  member  a  of  A 
can  be  written  as  a  =  C\w^  +  •  •  •  +  cnwpn  uniquely  with  the  scalars  a, ...  ,cn 
nonzero  and  with  each  wp  in  B.  Let  B„  be  the  finite  subset  { wpl . ... .  wpn }.  Then 
we  have  associated  to  each  member  of  A  a  finite  subset  Ba  of  B.  Let  us  see  that 
U,;g4  Ba  =  B.  If  b  is  in  B.  then  the  linear  span  of  B  —  (/;}  is  not  all  of  V.  Thus 
some  v  in  V  is  not  in  this  span.  Expand  v  in  terms  of  A  as  v  =  d\  vai  +  •  •  •  +dm  vUm 
with  all  dj  i=-  0.  Since  v  is  not  in  the  linear  span  of  B  —  {/:/},  some  ao  =  vajo 
with  1  <  jo  <  m  is  not  in  this  linear  span.  Then  b  is  in  Bao,  and  we  conclude 
that  B  =  4  Ba.  By  the  corollary  near  the  end  of  Section  A6  of  the  appendix, 

card  B  <  card  A.  Reversing  the  roles  of  A  and  B,  we  obtain  card  A  <  card  B. 
By  the  Schroeder-Bernstein  Theorem,  A  and  B  have  the  same  cardinality.  This 
proves  (d).  □ 
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Now  let  us  go  through  the  results  of  the  chapter  and  see  how  many  of  them 
extend  to  the  infinite-dimensional  case  and  why.  It  is  possible  but  not  very  useful 
in  the  infinite-dimensional  case  to  associate  an  infinite  “matrix”  to  a  linear  map 
when  bases  or  ordered  bases  are  specified  for  the  domain  and  range.  Because  this 
association  is  not  very  useful,  we  shall  not  attempt  to  extend  any  of  the  results 
concerning  matrices.  The  facts  concerning  extensions  of  results  just  dealing  with 
dimensions  and  linear  maps  are  as  follows: 

Corollary  2.5.  If  V  is  any  vector  space  and  U  is  a  vector  subspace,  then 
dim  U  <  dim  V. 

In  fact,  take  a  basis  of  JJ  and  extend  it  to  a  basis  of  V ;  a  basis  of  U  is  then 
exhibited  as  a  subset  of  a  basis  of  V,  and  the  conclusion  about  cardinal-number 
dimensions  follows. 

Proposition  2.13.  Let  U  and  V  be  vector  spaces  over  F,  and  let  T  be  a  basis 
of  U.  Then  to  each  function  i  :  T  — V  corresponds  one  and  only  one  linear 
map  L  :  U  — >  V  such  that  L  |  =  t. 

In  fact,  the  proof  given  in  Section  3  is  valid  with  no  assumption  about  finite 
dimensionality. 

Corollary  2.15.  If  L  :  U  — >  V  is  a  linear  map  between  vector  spaces  over 
F,  then 

dim(domain(L))  =  dim(kernel(L))  +  dim(image(L)). 

In  fact,  this  formula  remains  valid,  but  the  earlier  proof  via  matrices  has  to  be 
replaced.  Instead,  take  a  basis  ( va  a  e  A]  of  the  kernel  and  extend  it  to  a  basis 
{va  |  a  e  5}  of  the  domain.  It  is  routine  to  check  that  {L(va)  \  a  e  S  —  A]  is  a 
basis  of  the  image  of  L. 

Theorem  2.16  (part).  The  composition  of  two  linear  maps  is  linear. 

In  fact,  the  proof  in  Section  3  remains  valid  with  no  assumption  about  finite 
dimensionality. 

Proposition  2.18.  Two  vector  spaces  over  F  are  isomorphic  if  and  only  if 
they  have  the  same  cardinal-number  dimension. 

In  fact,  this  result  follows  from  Proposition  2.13  just  as  it  did  in  the  finite¬ 
dimensional  case;  the  only  changes  that  are  needed  in  the  argument  in  Section  3 
are  small  adjustments  of  the  notation.  Of  course,  one  must  not  overinterpret  this 
result  on  the  basis  of  the  remark  with  Theorem  2.42:  two  vector  spaces  with 
dimension  +oo  need  not  be  isomorphic.  Despite  the  apparent  definitive  sound  of 
Proposition  2.18,  one  must  not  attach  too  much  significance  to  it;  vector  spaces 
that  arise  in  practice  tend  to  have  some  additional  structure,  and  an  isomorphism 
based  merely  on  equality  of  dimensions  need  not  preserve  the  additional  structure. 
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Proposition  2.19.  If  V  is  a  vector  space  and  V'  is  its  dual,  then  dim  V  < 
dim  V' .  (In  the  infinite-dimensional  case  we  do  not  have  equality.) 

In  fact,  take  a  basis  { va }  of  V.  If  for  each  a  we  define  v'a(vp )  =  Sap  and  use 
Proposition  2.13  to  form  the  linear  extension  v'a,  then  the  set  { v'a }  is  a  linearly 
independent  subset  of  V  that  is  in  one-one  correspondence  with  the  basis  of  V. 
Extending  {v'a}  to  a  basis  of  V' ,  we  obtain  the  result. 

Proposition  2.20.  Let  V  be  a  vector  space,  and  let  U  be  a  vector  subspace  of 
V.  Then 

(b)  every  linear  functional  on  U  extends  to  a  linear  functional  on  V, 

(c)  whenever  vq  is  a  member  of  V  that  is  not  in  £/,  there  exists  a  linear 
functional  on  V  that  is  0  on  U  and  is  1  on  vq- 

Conclusion  (a)  of  the  original  Proposition  2.20,  which  concerns  annihilators,  does 
not  extend  to  the  infinite-dimensional  case. 

To  prove  (b)  without  the  finite  dimensionality,  let  u'  be  a  given  linear  functional 
on  U,  let  {ua}  be  a  basis  of  U,  and  let  {i^}  be  a  subset  of  V  such  that  {ua}  U  {i^} 
is  a  basis  of  V .  Define  vr(ua)  =  u'(ua )  for  each  a  and  vr(vp)  =  0  for  each  /3. 
Using  Proposition  2.13,  let  v'  be  the  linear  extension  to  a  linear  functional  on  V . 
Then  v'  has  the  required  properties. 

To  prove  (c)  without  the  finite  dimensionality,  we  take  a  basis  {»„)  of  U  and 
extend  {ua}  U  {no}  to  a  basis  of  V.  Define  v'  to  equal  0  on  each  ua,  to  equal  1  on 
Vo,  and  to  equal  0  on  the  remaining  members  of  the  basis  of  V.  Then  the  linear 
extension  of  v'  to  V  is  the  required  linear  functional. 

Proposition  2.22.  If  V  is  any  vector  space  over  IF,  then  the  canonical  map 
i  :  V  — >  V"  is  one-one.  The  canonical  map  is  not  onto  V"  if  V  is  infinite¬ 
dimensional. 

The  proof  that  it  is  one-one  given  in  Section  4  is  applicable  in  the  infinite¬ 
dimensional  case  since  we  know  from  Theorem  2.42  that  any  linearly  independent 
subset  of  V  can  be  extended  to  a  basis.  For  the  second  conclusion  when  V  has  a 
countably  infinite  basis,  see  Problem  3 1  at  the  end  of  the  chapter. 

Proposition  2.23  through  Corollary  2.29.  For  these  results  about  quo¬ 
tients,  the  only  place  that  finite  dimensionality  played  a  role  was  in  the  dimension 
formulas,  Corollaries  2.24  and  2.29.  We  restate  these  two  results  separately. 

Corollary  2.24.  If  V  is  a  vector  space  over  F  and  U  is  a  vector  subspace, 
then 

(a)  dim  V  =  dim  U  +  dim( V/U), 

(b)  the  subspace  U  is  the  kernel  of  some  linear  map  defined  on  V . 
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The  proof  in  Section  5  requires  no  changes:  Let  q  be  the  quotient  map.  The 
linear  map  q  meets  the  conditions  of  (b).  For  (a),  take  a  basis  of  U  and  extend  to 
a  basis  of  V.  Then  the  images  under  q  of  the  additional  vectors  form  a  basis  of 
V/U. 

Corollary  2.29.  Let  M  and  N  be  vector  subspaces  of  a  vector  space  V  over 
F.  Then 

dim (M  +  N)  +  dim(M  Cl  N)  =  dim  M  +  dim  N. 

In  fact,  Corollary  2.24a  gives  us  dim(M  +  N)  =  dim ((M  +  N) / M)  +  dim  M. 
Substituting  dim ({M  +  N)/M)  =  dim(iV /(M  D  N))  from  Theorem  2.28  and 
adding  dim (M  Cl  N)  to  both  sides,  we  obtain  dim (M  +  N)  +  dim(M  Cl  N)  = 
dim(M  D  N )  +  dim(iV/(M  Cl  N))  +  dim  M.  The  first  two  terms  on  the  right  side 
add  to  dim  N  by  Corollary  2.24a,  and  the  result  follows. 

Propositions  2.30  through  2.33.  These  results  about  direct  products  and 
direct  sums  did  not  assume  any  finite  dimensionality. 

The  determinants  of  Sections  7-8  have  no  infinite-dimensional  generalization, 
and  Proposition  2.41  is  the  only  result  in  those  two  sections  with  a  valid  infinite¬ 
dimensional  analog.  The  valid  analog  in  the  infinite-dimensional  case  is  that 
eigenvectors  for  distinct  eigenvalues  under  a  linear  map  are  linearly  independent. 
The  proof  given  for  Proposition  2.41  in  Section  8  adapts  to  handle  this  analog, 
provided  we  interpret  components  vjJ  )  of  a  vector  Vj  as  the  coefficients  needed 
to  expand  u,  in  a  basis  of  the  underlying  vector  space. 


10.  Problems 


1. 


2. 


Determine  bases  of  the  following  subsets  of  M3 : 
(a)  the  plane  3.t  —  2y  +  5z  —  0, 

{ x  =  2 1  1 


(b)  the  line 


y  =  -t 

z  =  4f 


where  — oo  <  t  <  oo. 


This  problem  shows  that  the  associativity  law  in  the  definition  of  “vector  space” 
implies  certain  more  complicated  formulas  of  which  the  stated  law  is  a  special 
case.  Let  iq, . . . ,  v„  be  vectors  in  a  vector  space  V.  The  only  vector-space 
properties  that  are  to  be  used  in  this  problem  are  associativity  of  addition  and  the 
existence  of  the  0  element. 


(a)  Define  vqq  inductively  upward  by  V(0)  =  0  and  vqq  —  V(k- p  +  v^,  and 

define  i>(,)  inductively  downward  by  =  0  and  v(l>  —  ty  +  u(/+1). 

Prove  that  v#)  +  v^k+r>  is  always  the  same  element  for  0  <  k  <  n. 

(b)  Prove  that  the  same  element  of  V  results  from  any  way  of  inserting  paren¬ 
theses  in  the  sum  v\  +  ■  ■  ■  +  vn  so  that  each  step  requires  the  addition  of 
only  two  members  of  V. 
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3.  This  problem  shows  that  the  commutative  and  associative  laws  in  the  definition 
of  “vector  space”  together  imply  certain  more  complicated  formulas  of  which  the 
stated  commutative  law  is  a  special  case.  Let  v\, . . . ,  vn  be  vectors  in  a  vector 
space  V.  The  only  vector-space  properties  that  are  to  be  used  in  this  problem  are 
commutativity  of  addition  and  the  properties  in  the  previous  problem.  Because 
of  the  previous  problem,  iq  +  ■  ■  ■  +  vn  is  a  well-defined  element  of  V ,  and  it  is 
not  necessary  to  insert  any  parentheses  in  it.  Prove  that  v\  +  V2  +  ■  ■  ■  +  vn  = 
va( i)  +  va(2)  +  ■  •  •  +  va(n )  for  each  permutation  a  of  {1,  ... ,  «}. 

4.  For  the  matrix  A  =  I  2  4  6  1,  find 

Vo  o -8/ 

(a)  a  basis  for  the  row  space, 

(b)  a  basis  for  the  column  space,  and 

(c)  the  rank  of  the  matrix. 

5.  Let  A  be  an  n-by-n  matrix  of  rank  one.  Prove  that  there  exists  an  n-dimensional 
column  vector  c  and  an  n -dimensional  row  vector  r  such  that  A  —  cr. 


6.  Let  A  be  a  k-by-n  matrix,  and  let  A1  be  a  reduced  row-echelon  form  of  A. 

(a)  Prove  for  each  r  that  the  rows  of  R  whose  first  r  entries  are  0  form  a  basis 
for  the  vector  subspace  of  all  members  of  the  row  space  of  A  whose  first  r 
entries  are  0. 

(b)  Prove  that  the  reduced  row-echelon  form  of  A  is  unique  in  the  sense  that  any 
two  sequences  of  steps  of  row  reduction  lead  to  the  same  reduced  form. 

7.  Let  E  be  an  finite  set  of  N  points,  let  V  be  the  A -dimensional  vector  space  of 
all  real-valued  functions  on  E,  and  let  n  be  an  integer  with  0  <  n  <  N.  Suppose 
that  U  is  an  n -dimensional  subspace  of  V .  Prove  that  there  exists  a  subset  D  of 
n  points  in  E  such  that  the  vector  space  of  restrictions  to  D  of  the  members  of 
U  has  dimension  n . 


8. 

9. 


A  linear  map  L  : 
-6  - 


(T  ID- 


Mz  is  given  in  the  standard  ordered  basis  by  the  matrix 
Find  the  matrix  of  L  in  the  ordered  basis  j  ^  2  )  .  ^  j  | . 


Let  V  be  the  real  vector  space  of  all  polynomials  in  x  of  degree  <  2,  and  let 
L  :  V  — »•  V  be  the  linear  map  I  —  D2,  where  I  is  the  identity  and  D  is  the 
differentiation  operator  d /dx.  Prove  that  L  is  invertible. 


10.  Let  A  be  in  M, tm(C)  and  B  be  in  Mmn( C).  Prove  that 


rank  (A  B)  <  max  (rank  A.  rank  B ). 


1 1 .  Let  A  be  in  (C)  with  k  >  n.  Prove  that  there  exists  no  B  in  Mnj<  (C)  with 
AB  =  I. 

12.  Let  A  be  in  M*„( C)  and  B  be  in  M„*(C).  Give  an  example  with  k  =  n  to  show 
that  rank ( A  B )  need  not  equal  rank(/lA). 
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13.  With  the  differential  equation  y" (t)  —  y(t)  in  Example  2  of  Section  3,  two 
examples  of  linear  functionals  on  the  vector  space  of  solutions  are  given  by 
£  i  (y)  =  y(0)  and  ii{y)  —  y'(0).  Find  a  basis  of  the  space  of  solutions  such  that 
{£ i,  £2}  is  the  dual  basis. 

14.  Suppose  that  a  vector  space  V  has  a  countably  infinite  basis.  Prove  that  the  dual 
V'  has  an  uncountable  linearly  independent  set. 

15.  (a)  Give  an  example  of  a  vector  space  and  three  vector  subspaces  L,  M,  and  N 

such  that  Ln(M+iV)/(LnM)  +  (Ln  N). 

(b)  Show  that  inclusion  always  holds  in  one  direction  in  (a). 

(c)  Show  that  equality  always  holds  in  (a)  if  L  P  M. 

16.  Construct  three  vector  subspaces  M,  N\,  and  N2  of  a  vector  space  V  such  that 
M  ©  Ni  —  M  ®  N 2  —  V  but  N\  ^  N 2-  What  is  the  geometric  picture 
corresponding  to  this  situation? 

17.  Suppose  that  x,y,u,  and  v  are  vectors  in  R4;  let  M  and  N  be  the  vector  subspaces 
of  R4  spanned  by  {x,  y)  and  { u ,  i>),  respectively.  In  which  of  the  following  cases 
is  it  true  that  R4  =  M  ®  N1 

(a)  x  =  (1,  1,0,  0),  y  =  (1,0,  1,0),  u  =  (0,  1,0,  1),  v  =  (0,0,  1,  1); 

(b)  x  =  (-1,  1,  1,0),  y  =  (0,  1,  -1,  1),  u  =  (1,0,  0,0),  v  =  (0,0,0,  1); 

(c)  x  =  (1,0,0,  1),  y  =  (0,  1,  1,0),  u  =  (1,0,  1,0),  v  =  (0,  1,0,  1). 

18.  Section  6  gave  definitions  and  properties  of  projections  and  injections  associated 
with  the  direct  sum  of  two  vector  spaces.  Write  down  corresponding  definitions 
and  properties  for  projections  and  injections  in  the  case  of  the  direct  sum  of  n 
vector  spaces,  n  being  an  integer  >  2. 

19.  Let  T  :  R"  — ►  R"  be  a  linear  map  with  ker  T  Pi  image  T  —  0. 

(a)  Prove  that  R"  =  ker  T  ©  image  T. 

(b)  Prove  that  the  condition  ker  T  fl  image  T  =  0  is  satisfied  if  T2  —  T . 

20.  If  Vj  and  V2  are  two  vector  spaces  over  F,  prove  that  (Vj  ©  V2)1  is  canonically 
isomorphic  to  V[  © 

21.  Suppose  that  M  is  a  vector  subspace  of  a  vector  space  V  and  that  q  :  V  ->  V / M 
is  the  quotient  map.  Corresponding  to  each  linear  functional  v  on  V/M  is  a 
linear  functional  z  on  V  given  by  z  =  yq  ■  Why  is  the  correspondence  y  m*  z  an 
isomorphism  between  ( V / M)'  and  Ann  Ml 

22.  Let  M  be  a  vector  subspace  of  the  vector  space  V,  and  let  q  :  V  — »■  V /M  be  the 
quotient  map.  Suppose  that  A  is  a  vector  subspace  of  V.  Prove  that  V  =  M  ©  N 
if  and  only  if  the  restriction  of  q  to  N  is  an  isomorphism  of  N  onto  V /  M. 

23.  For  a  square  matrix  A  of  integers,  prove  that  the  inverse  has  integer  entries  if  and 
only  if  det  A  =  ±  1 . 
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24.  Let  A  be  in  M^„(C),  and  let  r  =  rank  A.  Prove  that  r  is  the  largest  integer 
such  that  there  exist  r  row  indices  i\ ,  . . . ,  i  r  and  r  column  indices  ji,  ,  jr 
for  which  the  r-by-r  matrix  formed  from  these  rows  and  columns  of  A  has 
nonzero  determinant.  (Educational  note:  This  problem  characterizes  the  subset 
of  matrices  of  rank  <  r  —  1  as  the  set  in  which  all  determinants  of  r-by-r 
submatrices  are  zero.) 


25.  Suppose  that  a  linear  combination  of  functions  t  e-r  ect  with  c  real  vanishes  for 
every  integer  t  >  0.  Prove  that  it  vanishes  for  every  real  t. 

26.  Find  all  eigenvalues  and  eigenvectors  of  A  =  (_!*!)• 


27.  Let  A  and  C  be  n-by-n  matrices  with  C  invertible.  By  making  a  direct  calculation 
with  the  entries,  prove  that  Tr(C-1  AC)  =  Tr  A. 


28.  Find  the  characteristic  polynomial  oftheu -by -77  matrix 


0  10  0 
0  0  10 
0  0  0  1 
oooo 


o 

o  o 
o  o 
o  o 


°  \ 

n  \ 


0  0  0  0  ■■■  0  1  . 

■  a0  a i  a2  a2  ■■■  a„-2  a„_  1  ' 


29.  Let  A  and  B  be  in  M„„( C). 

(a)  Prove  under  the  assumption  that  A  is  invertible  that  det(/. /  —  AB)  = 
det(A/  -  BA). 

(b)  By  working  with  A  +  el  and  letting  e  tend  to  0,  show  that  the  assumption 
in  (a)  that  A  is  invertible  can  be  dropped. 

30.  In  proving  Theorem  2.42a,  it  is  tempting  to  argue  by  considering  all  spanning 
subsets  of  the  given  set,  ordering  them  by  inclusion  downward,  and  seeking  a 
minimal  element  by  Zorn’s  Lemma.  Give  an  example  of  a  chain  in  this  ordering 
that  has  no  lower  bound,  thereby  showing  that  this  line  of  argument  cannot  work. 

Problems  3 1-34  concern  annihilators.  Let  V  be  a  vector  space,  let  M  and  N  be  vector 
subspaces,  and  let  (  :  V  — >  V"  be  the  canonical  map. 

31.  If  V  has  an  infinite  basis,  how  can  we  conclude  that  i  does  not  carry  V  onto  V"? 

32.  Prove  that  Ann (M  +  N)  =  Ann  M  Pi  Ann  N. 

33.  Prove  that  Ann (M  fl  N)  —  Ann  M  +  Ann  N. 

34.  (a)  Prove  that  i  (M)  c  Ann(AnnM). 

(b)  Prove  that  equality  holds  in  (a)  if  V  is  finite-dimensional. 

(c)  Give  an  infinite-dimensional  example  in  which  equality  fails  in  (a). 
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Problems  35-39  concern  operations  by  blocks  within  matrices. 


35.  Let  A  be  a  k-by-m  matrix  of  the  form  A  —  (A\  A2),  where  A 1  has  size 

k-by-mi,  A2  has  size  k-by-m2,  and  m  1  +  m2  —  m.  Let  B  by  an  m'-by-n  matrix 


of  the  form  B 


B\ 

B2 


,  where  B\  has  size  m'j-by-n,  B2  has  size  m'2-by-n,  and 


m  j  +  m'2  —  in' . 

(a)  If  m  1  =  m\  and  m2  =  m'2,  prove  that  AB  —  A\B\  +  A2B2. 


(b)  If  k  =  n,  prove  that  BA  — 


/BtAi 

\  b2a2 


B\A2\ 

b2a2  j 


(c)  Deduce  a  general  rule  for  block  multiplication  of  matrices  that  are  in  2-by-2 
block  form. 


36.  Let  A  be  in  MAA(C),  B  be  in  MA„(C),  and  D  be  in  Mnn( C).  Prove  that 

det  ^  0  ^  ^  =  det  A  det  D. 

37.  Let  A,  B,  C,  and  I)  be  in  M,m{ C).  Suppose  that  A  is  invertible  and  that  AC  = 
C A.  Prove  that  det  ^  ^  ^^=det (AD  —  CB). 

38.  Let  A  be  in  M^„(C)  and  B  be  in  Mn/t(C)  with  k  <  n.  Let  /j-  be  the  k-by- 
k  identity,  and  let  /„  be  the  n-by-n  identity.  Using  Problem  29,  prove  that 
det (A/„  -  BA)  =  Xn~k  det(l/A.  -  AB). 

39.  Prove  the  following  block-form  generalization  of  the  expansion-in-cofactors 
formula.  For  each  subset  S  of  {1, let  Sc  be  the  complementary  subset 
within  {1, . . . ,  «},  and  let  sgn(,S',  S' )  be  the  sign  of  the  permutation  that  carries 
(1,  ...,«)  to  the  members  of  S  in  order,  followed  by  the  members  of  Sc  in  order. 
Fix  k  with  1  <  k  <  n  —  1,  and  let  the  subset  S  have  |S|  =  k.  For  an  n-by-n 
matrix  A,  define  A(S)  to  be  the  square  matrix  of  size  k  obtained  by  using  the 
rows  of  A  indexed  by  1 , ,k  and  the  columns  indexed  by  the  members  of  S. 
Let  A(S)  be  the  square  matrix  of  size  k  —  1  obtained  by  using  the  rows  of  A 
indexed  by  k  +  1, . . . ,  n  and  the  columns  indexed  by  the  members  of  Sc.  Prove 
that 


det  A  =  sgn(S,  Sc)  det  A(5)  det  A(S). 

SC{1, 

ISM 


Problems  40-44  compute  the  determinants  of  certain  matrices  known  as  Cartan 
matrices.  These  have  geometric  significance  in  the  theory  of  Lie  groups. 
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40.  Let  A„  be  the  n-by-n  matrix 


Using  expansion  in 


cofactors  about  the  last  row,  prove  that  det  An  =  2detA„_i  —  detA„_2  for 
n  >  3. 


41.  Computing  det  A\  and  det  A  2  directly  and  using  the  recursion  in  Problem  40, 
prove  that  det  A„  =  n  +  1  for  n  >  1 . 

42.  Let  Cn  for  n  >  2  be  the  matrix  A„  except  that  the  (1, 2)th  entry  is  changed  from 
-1  to -2. 

(a)  Expanding  in  cofactors  about  the  last  row,  prove  that  the  argument  of  Prob¬ 
lem  40  is  still  applicable  when  n  >  4  and  a  recursion  formula  for  det  C„ 
results  with  the  same  coefficients. 

(b)  Computing  det  C 2  and  det  C3  directly  and  using  the  recursion  equation  in 

(a),  prove  that  det  Cn  =  2  for  n  >  2. 

43.  Let  £>„  for  n  >  3  be  the  matrix  A„  except  that  the  upper  left  3-by-3  piece  is 

/  2-1  o\  /  2  o-i\ 

changed  from  I  -1  2  —1  I  to  I  0  2  -1  ). 

V  0  -1  2/  V-i  -1  2/ 

(a)  Expanding  in  cofactors  about  the  last  row,  prove  that  the  argument  of  Prob¬ 
lem  40  is  still  applicable  when  n  >  5  and  a  recursion  formula  for  det  D„ 
results  with  the  same  coefficients. 


(b)  Show  that  />,  can  be  transformed  into  A3  by  suitable  interchanges  of  rows 
and  interchanges  of  columns,  and  conclude  that  det  £>3  =  det  A3  =  4. 

(c)  Computing  det  £>4  directly  and  using  (b)  and  the  recursion  equation  in  (a), 
prove  that  det  Dn  =  4  for  n  >  3. 

44.  Let  En  for  n  >  4  be  the  matrix  An  except  that  the  upper  left  4-by-4  piece  is 


changed  from 


2-100 
-1  2-1  0 
0-1  2-1 


2-100 
-1  2  0-1 
002-1 


0  0-1  2 , 


0-1-1  2, 


(a)  Expanding  in  cofactors  about  the  last  row.  prove  that  the  argument  of  Prob¬ 
lem  40  is  still  applicable  when  n  >  6  and  a  recursion  formula  for  det  En 
results  with  the  same  coefficients. 


(b)  Show  that  £4  can  be  transformed  into  A4  by  suitable  interchanges  of  rows 
and  interchanges  of  columns,  and  conclude  that  det  £4  =  det  A4  =  5. 

(c)  Show  that  £5  can  be  transformed  into  £>5  by  suitable  interchanges  of  rows 
and  interchanges  of  columns,  and  conclude  that  det  £5  =  det  £>5  =  4. 

(d)  Using  (b)  and  (c)  and  the  recursion  equation  in  (a),  prove  that  det  E„  —9  —  n 
for  n  >4. 
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Problems  45-48  relate  determinants  to  areas  and  volumes.  They  begin  by  showing 
how  a  computation  of  an  area  in  R2  leads  to  a  determinant,  they  then  show  how 
knowledge  of  the  answer  and  of  the  method  of  row  reduction  illuminate  the  result, 
and  finally  they  indicate  how  the  result  extends  to  R3.  If  u  and  v  are  vectors  in  R2,  let 
us  say  that  the  parallelogram  determined  by  u  and  v  is  the  parallelogram  with  vertices 
0,  m,  i>,  and  u  +  v.  If  u,  v ,  and  w  are  in  R3,  the  parallelepiped  determined  by  u,  v,  and 
w  is  the  parallelepiped  with  vertices  0,  u,  v ,  w,  u  +  v,  u  +  w,  v  +  w ,  and  u  +  v  +  w. 

45.  The  area  of  a  trapezoid  is  the  product  of  the  average  of  the  two  parallel  sides  by 
the  distance  between  the  parallel  sides.  Compute  the  area  of  the  parallelogram 
determined  by  u  —  ^  j  and  v  =  ^  j  in  the  diagram  below  as  the  area  of  a 
large  rectangle  minus  the  area  of  two  trapezoids  minus  the  area  of  two  triangles, 
recognizing  the  answer  as  det  (  “  *  j  except  for  a  minus  sign.  To  what  extent  is 
the  answer  dependent  on  the  picture? 


FIGURE  2.6.  Area  of  a  parallelogram  as  a  difference  of  areas. 


46.  What  is  the  geometric  effect  on  the  parallelogram  of  replacing  the  matrix  ^  ^ 

by  the  matrix  (‘‘/’j  'j  (o  i)’  i-e-’  °f  right-multiplying  ^  j  by  ^  What 

does  this  change  do  to  the  area?  What  algebraic  operation  does  this  change 
correspond  to? 

47.  Answer  the  same  questions  as  in  Problem  46  for  right  multiplication  by  the 


matrices 
number  r . 


(r  i )’  (to)’  (o  i  )  f°r  a  nonzero  number  q,  and  ^  ^  for 


a  nonzero 


48.  Explain  on  the  basis  of  Problems  45-47  why  if  three  column  vectors  n,  v,  and  w 
in  R3  are  assembled  into  a  3-by-3  matrix  A  and  A  is  invertible,  then  the  volume 
of  the  parallelepiped  determined  by  u,  v,  and  w  has  to  be  |  det  A\. 


CHAPTER  III 


Inner-Product  Spaces 


Abstract.  This  chapter  investigates  the  effects  of  adding  the  additional  structure  of  an  inner  product 
to  a  finite-dimensional  real  or  complex  vector  space. 

Section  1  concerns  the  effect  on  the  vector  space  itself,  defining  inner  products  and  their  cor¬ 
responding  norms  and  giving  a  number  of  examples  and  formulas  for  the  computation  of  norms. 
Vector-space  bases  that  are  orthonormal  play  a  special  role. 

Section  2  concerns  the  effect  on  linear  maps.  The  inner  product  makes  itself  felt  partly  through 
the  notion  of  the  adjoint  of  a  linear  map.  The  section  pays  special  attention  to  linear  maps  that  are 
self-adjoint,  i.e.,  are  equal  to  their  own  adjoints,  and  to  those  that  are  unitary,  i.e.,  preserve  norms  of 
vectors. 

Section  3  proves  the  Spectral  Theorem  for  self-adjoint  linear  maps  on  finite-dimensional  inner- 
product  spaces.  The  theorem  says  in  part  that  any  self-adjoint  linear  map  has  an  orthonormal  basis 
of  eigenvectors.  The  Spectral  Theorem  has  several  important  consequences,  one  of  which  is  the 
existence  of  a  unique  positive  semidefinite  square  root  for  any  positive  semidefinite  linear  map.  The 
section  concludes  with  the  polar  decomposition,  showing  that  any  linear  map  factors  as  the  product 
of  a  unitary  linear  map  and  a  positive  semidefinite  one. 


1.  Inner  Products  and  Orthonormal  Sets 

In  this  chapter  we  examine  the  effect  of  adding  further  geometric  structure  to 
the  structure  of  a  real  or  complex  vector  space  as  defined  in  Chapter  II.  To  be 
a  little  more  specific  in  the  cases  of  R2  and  R3,  the  development  of  Chapter  II 
amounted  to  working  with  points,  lines,  planes,  coordinates,  and  parallelism,  but 
nothing  further.  In  the  present  chapter,  by  comparison,  we  shall  take  advantage 
of  additional  structure  that  captures  the  notions  of  distances  and  angles. 

We  take  IF  to  be  R  or  C,  continuing  to  call  its  members  the  scalars.  We 
do  not  allow  F  to  be  Q  in  this  chapter;  the  main  results  will  make  essential 
use  of  additional  facts  about  R  and  C  beyond  those  of  addition,  subtraction, 
multiplication,  and  division.  The  relevant  additional  facts  are  summarized  in 
Sections  A3  and  A4  of  the  appendix.1 

1  The  theory  of  Chapter  II  will  be  observed  in  Chapter  IV  to  extend  to  any  "field"  F  in  place  of  Q 
or  R  or  C,  but  the  theory  of  the  present  chapter  is  limited  to  M  and  C,  as  well  as  some  other  special 
fields  that  we  shall  not  try  to  isolate. 
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III.  Inner-Product  Spaces 


Many  of  the  results  that  we  obtain  will  be  limited  to  the  finite-dimensional  case. 
The  theory  of  inner-product  spaces  that  we  develop  has  an  infinite-dimensional 
generalization,  but  useful  results  for  the  generalization  make  use  of  a  hypothesis 
of  “completeness”  for  an  inner-product  space  that  we  are  not  in  a  position  to 
verify  in  examples.2 

Let  V  be  a  vector  space  over  IF.  An  inner  product  on  V  is  a  function  from 
V  x  V  into  F,  which  we  here  denote  by  ( • ,  • ),  with  the  following  properties: 

(i)  the  function  u  (u,  v)  of  V  into  F  is  linear, 

(ii)  the  function  v  (it,  v)  of  V  into  F  is  conjugate  linear  in  the  sense 
that  it  satisfies  (u,  v\  +  m)  =  (u,  uf)  +  (u,  Vi)  for  v\  and  m  in  V  and 
(u,  cv)  =  c(m,  v )  for  v  in  V  and  c  in  F, 

(iii)  (u,  v )  =  (v.u)  for  it  and  v  in  V, 

(iv)  (v,  v)  >  0  for  all  v  in  V, 

(v)  (v,  v )  =  0  only  if  v  =  0  in  V. 

The  overbars  in  (ii)  and  (iii)  indicate  complex  conjugation.  Property  (ii)  reduces 
when  F  =  R  to  the  fact  that  v  t->  (u,  v )  is  linear.  Properties  (i)  and  (ii)  together 
are  summarized  by  saying  that  ( • ,  • )  is  bilinear  if  F  =  R  or  sesquilinear  if 
F  =  C.  Property  (iii)  is  summarized  when  F  =  R  by  saying  that  ( • ,  • )  is 
symmetric,  or  when  F  =  C  by  saying  that  ( • ,  • )  is  Hermitian  symmetric. 

An  inner-product  space,  for  purposes  of  this  book,  is  a  vector  space  over  R 
or  C  with  an  inner  product  in  the  above  sense.3,4 


Examples. 

(1)  V  =  R"  with  (•,  •)  as  the  dot  product,  i.e.,  with  (x,  y)  =  y’x  = 
/x'\  />’'  \ 

I .  The  traditional  notation  for  the 


xiyi  H - b  xnyn  ifx  = 

dot  product  is  a  •  y. 


and  y  = 


^  yn  , 


(2)  V  =  C"  with  ( • ,  • )  defined  by  (x,  y)  =  y'x  =  xiyi  +  •  •  •  +  xnyn  if 


x  = 


and  y  = 


ry  l  ’ 


\  yn  / 


.  Here  y  denotes  the  entry-by-entry  complex  conjugate 


of  y.  The  sesquilinear  expression  ( • ,  • )  is  different  from  the  complex  bilinear 
dot  product  x  ■  y  =  xiyi  +  •  •  •  +  xny„. 


2A  careful  study  in  the  infinite-dimensional  case  is  normally  made  only  after  the  development 
of  a  considerable  number  of  topics  in  real  analysis. 

3  When  the  scalars  are  complex,  many  books  emphasize  the  presence  of  complex  scalars  by 
referring  to  the  inner  product  as  a  “Hermitian  inner  product.”  This  book  does  not  need  to  distinguish 
the  complex  case  very  often  and  therefore  will  not  use  the  modifier  “Hermitian"  with  the  term  “inner 
product.” 

4Some  authors,  particularly  in  connection  with  mathematical  physics,  reverse  the  roles  of  the 
two  variables,  defining  inner  products  to  be  conjugate  linear  in  the  first  variable  and  linear  in  the 
second  variable. 
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(3)  V  equal  to  the  vector  space  of  all  complex- valued  polynomials  with  (/,  g)  = 
fo  f(x)g(x)dx. 

Let  V  be  an  inner-product  space.  If  v  is  in  V,  define  |  v  |]  =  y/(v,  v),  calling 
||  •  ||  the  norm  associated  with  the  inner  product.  The  norm  of  v  is  understood  to 
be  the  nonnegative  square  root  of  the  nonnegative  real  number  (v,  v)  and  is  well 
defined  as  a  consequence  of  (iv).  In  the  case  of  R”,  ||x||  is  the  Euclidean  distance 

Jx^  +  •  •  •  +  x2  from  the  origin  to  the  column  vector  x  =  (xi ,  ....  x„).  In  this 
interpretation  the  dot  product  of  two  nonzero  vectors  in  R”  is  shown  in  analytic 
geometry  to  be  given  by  x  ■  y  =  \\x  ||  ||  y  ||  cos  0,  where  6  is  the  angle  between  the 
vectors  x  and  y. 

Direct  expansion  of  norms  squared  of  sums  of  vectors  using  bilinearity  or 
sesquilinearity  leads  to  certain  formulas  of  particular  interest.  The  formula  that 
we  shall  use  most  frequently  is 

||w  +  w||2  =  ||h||2  +  2Re(w,  v)  +  ||v||2, 

which  generalizes  from  R2  a  version  of  the  law  of  cosines  in  trigonometry  relating 
the  lengths  of  the  three  sides  of  a  triangle  when  one  of  the  angles  is  known.  With 
the  additional  hypothesis  that  (u,  v)  =  0,  this  formula  generalizes  from  R2  the 

Pythagorean  Theorem 


\\u  +  v\\2  =  \\u\\2+\\v\\2. 

Another  such  formula  is  the  parallelogram  law 

|| u  +  v\\2  +  || u  —  v || 2  =  2 1 1 m 1 1 2  +  2||u||2  for  all  u  and  v  in  V, 

which  is  proved  by  computing  \\u  +  i>||2  and  ||m  —  i>||2  by  the  law  of  cosines  and 
adding  the  results.  The  name  “parallelogram  law”  is  explained  by  the  geometric 
interpretation  in  the  case  of  the  dot  product  for  R2  and  is  illustrated  in  Figure  3.1. 
That  figure  uses  the  familiar  interpretation  of  vectors  in  R2  as  arrows,  two  arrows 
being  identified  if  they  are  translates  of  one  another;  thus  the  arrow  from  v  to  u 
represents  the  vector  u  —  v. 

The  parallelogram  law  is  closely  related  to  a  formula  for  recovering  the  inner 
product  from  the  norm,  namely 

(m,  v)  =  ^  y,  ik  || u  +ikv ||2, 

4  k 

where  the  sum  extends  for  k  e  {0,  2}  if  the  scalars  are  real  and  extends  for 
k  e  {0,  1.  2,  3}  if  the  scalars  are  complex.  This  formula  goes  under  the  name 
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polarization.  To  prove  it,  we  expand  \\u  +  ikv\\2  =  ||«||2  +  2Re(«,  ikv)  +  ||u||2 
=  ||m||2  +  2Re  ((— i)k{u,  u))  +  ||u||2.  Multiplying  by  ik  and  summing  on  k 
shows  that  ^2kik  \\u  +  ikv\\2  =  2J2kikRe((—i)k(u,  v )).  If  k  is  even,  then 
ik  Re((— i)kz)  =  Re  z  for  any  complex  z,  while  if  k  is  odd,  then  ik  Re((— i)kz)  = 
/Imz.  So2  ik  Re((— i)kz)  =  4z,  and  i*  ||m+/aTj||2  =  4(m,  u),  as  asserted. 


u  +  v 


0 


Figure  3.1.  Geometric  interpretation  of  the  parallelogram  law:  the  sum 
of  the  squared  lengths  of  the  four  sides  of  a  parallelogram 
equals  the  sum  of  the  squared  lengths  of  the  diagonals. 

Proposition  3.1  (Schwarz  inequality).  In  any  inner-product  space  V, 
[ (h,  u)|  <  || 2/ 1|  ||  v ||  for  all  u  and  v  in  V. 

Remark.  The  proof  is  written  so  as  to  use  properties  (i)  through  (iv)  in  the 
definition  of  inner  product  but  not  (v),  a  situation  often  encountered  with  integrals. 

PROOF.  Possibly  replacing  u  by  e'" u  for  some  real  9,  we  may  assume  that 
( u ,  v)  is  real.  In  the  case  that  ||u||  ^  0,  the  law  of  cosines  gives 

| U  -  \\v\r2(u,  V)v\2  =  \\u\\2  -  2||u|r2|(M,  u)|2  +  ||u||-4|(n,  u)|2||v||2. 

The  left  side  is  >  0,  and  the  right  side  simplifies  to  ||i<||2  —  ||w||-2|(n,  u)|2.  Thus 
the  inequality  follows  in  this  case. 

In  the  case  that  ||u||  =  0,  it  is  enough  to  prove  that  (u,  v)  =  0  for  all  u.  If  c  is 
a  scalar,  then  we  have 

II m  +  cu||2  =  \\u\\2  +  2Re  (c(u,  v))  +  |c|2||u||2  =  ||m||2  +  2Re(c(n,  u)). 

The  left  side  is  >  0  as  c  varies,  but  the  right  side  is  <  0  for  a  suitable  choice  of  c 
unless  (n,  u)  =  0.  This  completes  the  proof.  □ 

Proposition  3.2.  In  any  inner-product  space  V,  the  norm  satisfies 

(a)  I  u|  >0  for  all  v  in  V,  with  equality  if  and  only  if  v  =  0, 

(b)  || cu||  =  |c|  ||u||  for  all  u  in  V  and  all  scalars  c, 

(c)  || u  +  u||  <  ||m||  +  ||u||  for  all  u  and  u  in  V. 
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PROOF.  Conclusion  (a)  is  immediate  from  properties  (iv)  and  (v)  of  an  inner 
product,  and  (b)  follows  since  ||ct>||2  =  ( cv ,  cv)  =  cc(v,  v)  =  |c[2||u||2.  Finally 
we  use  the  law  of  cosines  and  the  Schwarz  inequality  (Proposition  3.1)  to  write 
II w  — F v || 2  =  ||M||2  +  2Re(M,  u)  +  |M|2  <  \\u\\2 +2\\u\\ ||n||  +  ||u||2  =  (||M||  +  ||n||)2. 
Taking  the  square  root  of  both  sides  yields  (c).  □ 

Two  vectors  u  and  v  in  V  are  said  to  be  orthogonal  if  (u,  v)  =  0,  and  one 
sometimes  writes  u  _L  v  in  this  case.  The  notation  is  a  reminder  of  the  interpre¬ 
tation  in  the  case  of  dot  product— that  dot  product  0  means  that  the  cosine  of  the 
angle  between  the  two  vectors  is  0  and  the  vectors  are  therefore  perpendicular. 
An  orthogonal  set  in  V  is  a  set  of  vectors  such  that  each  pair  is  orthogonal. 

The  nonzero  members  of  an  orthogonal  set  are  linearly  independent.  In  fact,  if 
{iq , . . . ,  ly  )  is  an  orthogonal  set  of  nonzero  vectors  and  some  linear  combination 
has  c\V\  +  •  •  •  +  CkVk  =  0,  then  the  inner  product  of  this  relation  with  Vj  gives 
0  =  (citq  T  •  •  •  T  cjcVic,  Vj)  =  Cj  1 1  Vy  1 1 2 ,  and  we  see  that  Cj  =  0  for  each  j. 

A  unit  vector  in  V  is  a  vector  u  with  ||u||  =  1.  If  u  is  any  nonzero  vector, 
then  i>/||n||  is  a  unit  vector.  An  orthonormal  set  in  V  is  an  orthogonal  set  of 
unit  vectors.  Under  the  assumption  that  V  is  finite-dimensional,  an  orthonormal 
basis  of  V  is  an  orthonormal  set  that  is  a  vector-space  basis.5 

Examples. 

(1)  In  W1  or  C",  the  standard  basis  {e\ .  . . . ,  e„ }  is  an  orthonormal  set. 

(2)  Let  V  be  the  complex  inner-product  space  of  all  complex  finite  linear 
combinations,  for  n  from  —N  to  +N,  of  the  functions  x  i->  e"lx  on  the  closed 
interval  [— n,  tt],  the  inner  product  being  (/,  g)  =  ^  f*  f(x)g(x)  dx.  With 
respect  to  this  inner  product,  the  functions  emx  form  an  orthonormal  set. 

A  simple  but  important  exercise  in  an  inner-product  space  is  to  resolve  a  vector 
into  the  sum  of  a  multiple  of  a  given  unit  vector  and  a  vector  orthogonal  to  the 
given  unit  vector.  This  exercise  is  solved  as  follows:  If  v  is  given  and  it  is  a  unit 
vector,  then  v  decomposes  as 

V  =  ( V ,  u)u  +  (u  —  (v,  u)u). 

Here  (v,u)u  is  a  multiple  of  u,  and  the  two  components  are  orthogonal  since 
(u,  v  —  ( v ,  u)u )  =  ( u ,  w)  —  (w,  u)(u,  u )  =  ( u ,  v )  —  ( u ,  v )  =  0.  This  decom¬ 
position  is  unique  since  if  v  =  iq  +  t>2  with  Pi  =  cu  and  ( m .  it)  =  0,  then  the 
inner  product  of  v  =  iq  +  im  with  u  yields  (v.  it)  =  (cu,  it)  +  (ia.  u)  =  c.  Hence 

5  In  the  infinite-dimensional  theory  the  term  “orthonormal  basis”  is  used  for  an  orthonormal  set 
that  spans  V  when  limits  of  finite  sums  are  allowed,  in  addition  to  finite  sums  themselves;  when  V 
is  infinite-dimensional,  an  orthonormal  basis  is  never  large  enough  to  be  a  vector-space  basis. 
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c  must  be  (v,  u),  V\  must  be  ( v ,  u)u,  and  V2  must  be  v  —  ( v ,  u)u.  Figure  3.2 
illustrates  the  decomposition,  and  Proposition  3.3  generalizes  it  by  replacing  the 
multiples  of  a  single  unit  vector  by  the  span  of  a  finite  orthonormal  set. 


FIGURE  3.2.  Resolution  of  v  into  a  component  (v.  u)u  parallel 
to  a  unit  vector  u  and  a  component  orthogonal  to  u. 

Proposition  3.3.  Let  V  be  an  inner-product  space.  If  {u\, ,  Uk }  is  an  or¬ 
thonormal  set  in  V  and  if  v  is  given  in  V,  then  there  exists  a  unique  decomposition 

V  =  C\U\  T  •  •  •  T  Cklik  T 

with  v1-  orthogonal  to  Uj  for  !<./'<  k-  In  this  decomposition  Cj  =  (v,  Uj). 

Remark.  The  proof  illustrates  a  technique  that  arises  often  in  mathematics. 
We  seek  to  prove  an  existence-uniqueness  theorem,  and  we  begin  by  making 
calculations  toward  uniqueness  that  narrow  down  the  possibilities.  We  are  led  to 
some  formulas  or  conditions,  and  we  use  these  to  define  the  object  in  question  and 
thereby  prove  existence.  Although  it  may  not  be  so  clear  except  in  retrospect,  this 
was  the  technique  that  lay  behind  proving  the  equivalence  of  various  conditions 
for  the  invertibility  of  a  square  matrix  in  Section  1.6.  The  technique  occurred 
again  in  defining  and  working  with  determinants  in  Section  II. 7. 

Proof  of  uniqueness.  Taking  the  inner  product  of  both  sides  with  uj,  we 
obtain  ( v,Uj )  =  {c\ii\  +  ■  ■  ■  +  CkUk  + v^~ .  Uj)  =  Cj  for  each  j.  Thenc,  =  ( v,Uj ) 
is  forced,  and  v1-  must  be  given  by  v  —  ( v ,  u\)u\  —  ■  ■  ■  —  (v,  Uk)uk-  □ 

Proof  of  existence.  Putting  cj  =  ( v .  iij),  we  need  check  only  that  the 

difference  v  —  (v,  u\)u\ - (v,  Uk)uk  is  orthogonal  to  each  uj  with  1  <  j  <  k. 

Direct  calculation  gives 

(v  -  J2i  «!•)«!.  uj)  =  ( v ,  uj)  -  £T((u,  Ui)ui,  Uj)  =  (v,  uj)  -  (v,  Uj)  =  0, 
and  the  proof  is  complete.  □ 

Corollary  3.4  (Bessel's  inequality).  Let  V  be  an  inner-product  space.  If 
{ «  i ,  ...,«*}  is  an  orthonormal  set  in  V  and  if  v  is  given  in  V,  then  Yl)=\  l(u>  uj)\2 
<  ||u||2  with  equality  if  and  only  if  v  is  in  span{«i, . . . ,  Uk}. 
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PROOF.  Using  Proposition  3.3,  write  v  =  E/=i  (w,  Uj)uj  +  tr 1  with  v1 
orthogonal  to  u\ ,  ...  ,uk.  Then 

Ml2  =  ( Ell  (+  ui)ui  +  K  E*=,  (+  uj)uj  +  V_L) 

=  E/.j  (+  “<)(«>  «;)(«<»  «/)  +  (  Ei  (w>  «/)«/.  K) 

+  (K  E/  0,  uj)uj)  +  iKll2 
=  E,-j  (+  +)(+  uj)sij  +  o  +  o+  ||i>_L||2 
=  E./=l  l(+  «;)|2  +  I|V_L||2- 

From  Proposition  3.3  we  know  that  i>  is  in  spanjn  i, . . . ,  uk)  if  and  only  if  i+  =  0, 
and  the  corollary  follows.  □ 

We  shall  now  impose  the  condition  of  finite  dimensionality  in  order  to  obtain 
suitable  kinds  of  orthonormal  sets.  The  argument  will  enable  us  to  give  a  basis- 
free  interpretation  of  Proposition  3.3  and  Corollary  3.4,  and  we  shall  obtain 
equivalent  conditions  for  the  vector  i+  in  Proposition  3.3  and  Corollary  3.4  to 
be  0  for  every  v. 

If  an  ordered  set  of  k  linearly  independent  vectors  in  the  inner-product  space 
V  is  given,  the  above  proposition  suggests  a  way  of  adjusting  the  set  so  that  it 
becomes  orthonormal.  Let  us  write  the  formulas  here  and  carry  out  the  verifi¬ 
cation  via  Proposition  3.3  in  the  proof  of  Proposition  3.5  below.  The  method 
of  adjusting  the  set  so  as  to  make  it  orthonormal  is  called  the  Gram-Schmidt 
orthogonalization  process.  The  given  linearly  independent  set  is  denoted  by 
{i>i, . . . ,  v/c],  and  we  define 

Vl 

II  vt  II  ’ 

V2  -  (v2,  Ui)Ui, 


ikh’ 

V3  ~  (V3,  Ul)U\  ~  (+3,  U2)U2, 


IK  If 


U 1  = 
11 2  = 
U  2  = 


«3  = 


U  3  = 


u'k  =  Vk-  (Vk,  U\)u\ - -  (Vk,  Uk-\)Uk-\, 
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Proposition  3.5.  If  { Vi ,  . . . ,  14}  is  a  linearly  independent  set  in  an  inner- 
product  space  V,  then  the  Gram-Schmidt  orthogonalization  process  replaces 
{14, . . . ,  14}  by  an  orthonormal  set  {u  1, . . . ,  u^}  such  that  spanjuj, . . . ,  t>;}  = 
span{m,  . . . ,  Uj }  for  all  j. 

PROOF.  We  argue  by  induction  on  j.  The  base  case  is  j  =  1,  and  the  result 
is  evident  in  this  case.  Assume  inductively  that  u\, ... ,  Uj-\  are  well  defined 
and  orthonormal  and  that  span  {14 , . . . ,  Vj- 1}  =  spanfni, . . . ,  «/-i}.  Proposition 
3.3  shows  that  u',  is  orthogonal  to  u\, . . . ,  Uj- \.  If  «'•  =  0,  then  Vj  has  to  be 
in  span{m, . . . ,  uj-\}  =  span  {14 , . . . ,  and  we  have  a  contradiction  to  the 
assumed  linear  independence  of  {14,  . . . ,  14}.  Thus  u'-  ^  0,  and  {u  1, . . . ,  Uj)  is  a 
well-defined  orthonormal  set.  This  set  must  be  linearly  independent,  and  hence  its 
linear  span  is  a  /-dimensional  vector  subspace  of  the  linear  span  of  {14, . . . ,  Vj } . 
By  Corollary  2.4,  the  two  linear  spans  coincide.  This  completes  the  induction 
and  the  proof.  □ 

Corollary  3.6.  If  V  is  a  finite-dimensional  inner-product  space,  then  any 
orthonormal  set  in  a  vector  subspace  S  of  V  can  be  extended  to  an  orthonormal 
basis  of  S. 

PROOF.  Extend  the  given  orthonormal  set  to  a  basis  of  S  by  Corollary  2.3b. 
Then  apply  the  Gram-Schmidt  orthogonalization  process.  The  given  vectors  do 
not  get  changed  by  the  process,  as  we  see  from  the  formulas  for  the  vectors  u'- 
and  Uj,  and  hence  the  result  is  an  extension  of  the  given  orthonormal  set  to  an 
orthonormal  basis.  □ 

Corollary  3.7.  If  S  is  a  vector  subspace  of  a  finite-dimensional  inner-product 
space  V,  then  S  has  an  orthonormal  basis. 

PROOF.  This  is  the  special  case  of  Corollary  3.6  in  which  the  given  orthonormal 
set  is  empty.  □ 

The  set  of  all  vectors  orthogonal  to  a  subset  M  of  the  inner-product  space  V 
is  denoted  by  M^.  In  symbols, 

M1  =  [u  e  V  |  (u,  v)  =  0  for  all  v  e  M}. 

We  see  by  inspection  that  ML  is  a  vector  subspace.  Moreover,  M  D  /W  =  0 
since  any  u  in  M  D  M  must  have  (it ,  u)  =  0.  The  interest  in  the  vector  subspace 
M1-  comes  from  the  following  proposition. 

Theorem  3.8  (Projection  Theorem).  If  S  is  a  vector  subspace  of  the  finite¬ 
dimensional  inner-product  space  V,  then  every  v  in  V  decomposes  uniquely  as 
v  =  14  +  14  with  14  in  S  and  14  in  5,J_.  In  other  words,  V  =  S  ©  S-1. 
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Remarks.  Because  of  this  proposition,  .S'1  is  often  called  the  orthogonal 
complement  of  the  vector  subspace  S. 

Proof.  Uniqueness  follows  from  the  fact  that  S  fl  S'1  =0.  For  existence, 
use  of  Corollaries  3.7  and  3.6  produces  an  orthonormal  basis  {u\, . . . ,  ur }  of  S 
and  extends  it  to  an  orthonormal  basis  {u\, . . . ,  un }  of  V.  The  vectors  uj  for 
j  >  r  are  orthogonal  to  each  m,  with  i  <  r  and  hence  are  in  .S1 .  If  v  is  given 
in  S,  we  can  write  v  =  J2'/=i 11  i  as  v  =  v\  +  V2  with  v\  =  ^j=1(v,  and 
V2  =  Yl'j=r+i(v’  Uj)uj,  and  this  decomposition  for  all  v  shows  that  V  =  S  +  .S1 . 

□ 

Corollary  3.9.  If  5  is  a  vector  subspace  of  the  finite-dimensional  inner-product 
space  V ,  then 

(a)  dim  V  =  dim  S  +  dim  51, 

(b)  S11  =  S. 

PROOF.  Conclusion  (a)  is  immediate  from  the  direct-sum  decomposition  V  = 
S  ©  5”1  of  Theorem  3.8.  For  (b),  the  definition  of  orthogonal  complement  gives 
S  C  S11.  On  the  other  hand,  application  of  (a)  twice  shows  that  S  and  .S'1 1  have 
the  same  finite  dimension.  By  Corollary  2.4,  S11  =  5.  □ 

Section  II. 6  introduced  “projection”  mappings  in  the  setting  of  any  direct  sum 
of  two  vector  spaces,  and  we  shall  use  those  mappings  in  connection  with  the 
decomposition  V  =  .S' ©.S'1  of  Theorem  3.8.  We  make  one  adjustment  in  working 
with  the  projections,  changing  their  ranges  from  the  image,  namely  ,S  or  S’1,  to 
the  larger  space  V.  In  effect,  a  linear  map  p\  or  ps  as  in  Section  II. 6  will  be 
replaced  by  i\p\  or  h_Pi- 

Specifically  let  E  :  V  — »■  V  be  the  linear  map  that  is  the  identity  on  S  and  is  0 
on  5.  Then  E  is  called  the  orthogonal  projection  of  V  on  S.  The  linear  map 
I  —  E  is  the  identity  on  .S1  and  is  0  on  S.  Since  S  =  .S11, 1  —  E  is  the  orthogonal 
projection  of  V  on  .S1.  It  is  the  linear  map  that  picks  out  the  S1  component 
relative  to  the  direct-sum  decomposition  V  =  S1  ©  .S11.  Proposition  3.3  and 
Corollary  3.4  can  be  restated  in  terms  of  orthogonal  projections. 

Corollary  3.10.  Let  V  be  a  finite-dimensional  inner-product  space,  let  S  be  a 
vector  subspace  of  V,  let  {u\, ...,(///}  be  an  orthonormal  basis  of  5,  and  let  E  be 
the  orthogonal  projection  of  V  on  S.  If  v  is  in  V ,  then 

k 

E(v )  =  ^  (u,  Uj)uj 

7=1 
k 

\\E(v)\\2  =  J2\(V’uj)\2- 


and 
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The  vector  vL  in  the  expansion  v  =  E/=i  (u>  uj)uj  +  ?-!±  of  Proposition  3.3  is 
equal  to  (7  —  E)v,  and  the  equality  of  norms 

k 

||u||2  =  £l(*'>«j)l2+lltrL||2 

has  the  interpretations  that 

IM|2  =  l|£(i>)||2  +  ||(/  —  E)v\\2 

and  that  equality  holds  in  Bessel’s  inequality  if  and  only  if  E(v)  =  v. 

Proof.  Write  v  =  Y2)=\  (c  uj)uj  +  u±  as  in  Proposition  3.3.  Then  each  Uj 
is  in  S,  and  the  vector  v1.  being  orthogonal  to  each  member  of  a  basis  of  .S',  is  in 
S  .  This  proves  the  formula  for  E(v),  and  the  formula  for  |  E(v)  ||2  follows  by 
applying  Corollary  3.4  to  v  —  v1 . 

Reassembling  v,  we  now  have  v  =  E(v)  +  v1,  and  hence  v1  =  v  —  E(v)  = 
(/  —  E) v.  Finally  the  decomposition  v  =  E(v)  +  (/  —  E)(v)  is  into  orthogonal 
terms,  and  the  Pythagorean  Theorem  shows  that  ||u||2  =  ||£(n)||2+  ||(7  —  E)v\\2. 

□ 

Theorem  3.11  (Parseval’s  equality).  If  V  is  a  finite-dimensional  inner-product 
space,  then  the  following  conditions  on  an  orthonormal  set  [u\, . . .  ,um}  are 
equivalent: 

(a)  [u i, . . . ,  um }  is  a  vector-space  basis  of  V,  hence  an  orthonormal  basis, 

(b)  the  only  vector  orthogonal  to  all  of  u\, ... ,  um  is  0, 

(c)  v  =  J2"j'=  l  (C  uj)uj  f°r  v  i*1  V’ 

(d)  Hull2  =  E"=i  l(«.  Wj) I2  for  all  v  in  V, 

(e)  (v,  w)  =  E7=i  (v>  Uj)(w,  Uj)  for  all  v  and  w  in  V. 

PROOF.  Let  S  =  span(«  | .  . . . ,  um},  and  let  E  be  the  orthogonal  projection  of 
V  on  S.  If  (a)  holds,  then  S  =  V  and  S1  =  0.  Thus  (b)  holds. 

If  (b)  holds,  then  5X  =  0  and  E  is  the  identity.  Thus  (c)  holds  by  Corollary 
3.10. 

If  (c)  holds,  then  Corollary  3.4  shows  that  (d)  holds. 

If  (d)  holds,  we  use  polarization  to  prove  (e).  Let  k  be  in  {0,  2}  if  IF  =  R,  or  in 
{0,  1,  2,  3}  if  F  =  C.  Conclusion  (d)  gives  us 

m  m 

\\v  +  ikw\\2  =  ^\(v  +  ikw,  Uj) |2  =  ||i>||2  +  y^2Re(Q,  Uj)ik(w,  Uj))  +  \\w\\2. 
l=i  i=i 

Multiplying  by  ik  and  summing  over  k,  we  obtain 

m 

4(u,  w)  =  2  ^2  ^2  'k  ((— uj)(w,  uj)). 

1=1  k 
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In  the  proof  of  polarization,  we  saw  that  2  ik  Re((—i)kz)  =  4 z.  Hence 
4(t>,  w)  =  4  NU,  (v,  Uj)(w,  Uj).  This  proves  (e). 

If  (e)  holds,  we  take  w  =  v  in  (e)  and  apply  Corollary  3.10  to  see  that 
|| (u) || 2  =  ||t>||2  for  all  v.  Then  ||(7  —  .Eft'll2  =  0  for  all  v,  and  £(t>)  =  v 
for  all  v.  Hence  S  =  V,  and  {u  i , . . . ,  um }  is  a  basis.  This  proves  (a).  □ 


Theorem  3.12  (Riesz  Representation  Theorem).  If  l  is  a  linear  functional  on 
the  finite-dimensional  inner-product  space  V,  then  there  exists  a  unique  v  in  V 
with  i(u)  =  (u,  v )  for  all  u  in  V. 

PROOF.  Uniqueness  is  immediate  by  subtracting  two  such  expressions,  since  if 
(m,  v)  =0  for  all  u,  then  the  special  case  u  =  v  gives  (v,  v)  =0  and  v  =  0.  Let 
us  prove  existence.  If  £  =  0,  take  v  =  0.  Otherwise  let  S  =  kerf.  Corollary  2.15 
shows  that  dim  S  =  dim  V  —  1,  and  Corollary  3.9a  then  shows  that  dim  .S'1  =  I . 
Let  u;  be  a  nonzero  vector  in  S1.  This  vector  in  must  have  f  (in)  7^  0  since 
5  0S1  =0,  and  we  let  v  be  the  member  of  S  given  by 


v  = 


f  (in) 

||w||2 


in. 


For  any  u  in  V,  we  have  l (u  —  in)  =  0,  and  hence  u  —  w  is  in  S.  Since 
v  is  in  S1,  u  —  in  is  orthogonal  to  v.  Thus 


(u,  v ) 


=  ( 


t(u) 


Vf  (in) 


in,  v  = 


)-( 


i(u)  t(w) 
£(w)  ’  II Ml II2 


wj  =  £(u) 


Kw)  Nr 


£(W)  ||  m || - 


=  t(u). 


This  proves  existence. 


□ 


2.  Adjoints 

Throughout  this  section,  V  will  denote  a  finite-dimensional  inner-product  space 
with  inner  product  ( • ,  • )  and  with  scalars  from  F,  with  F  equal  to  M  or  C.  We 
shall  study  aspects  of  linear  maps  L  :  V  — >■  V  related  to  the  inner  product  on  V . 
The  starting  point  is  to  associate  to  any  such  L  another  linear  map  L*  :  V  — V 
known  as  the  “adjoint”  of  V,  and  then  to  investigate  some  of  its  properties. 
A  tool  in  this  investigation  will  be  the  scalar-valued  function  on  V  x  V  given 
by  (h,  v )  (L(u),  v),  which  captures  the  information  in  any  matrix  of  L 
without  requiring  the  choice  of  an  ordered  basis.  This  function  determines  L 
uniquely  because  an  equality  ( L{u ),  v)  =  (L'(u),  v)  for  all  u  and  v  implies 
(E(i<)  —  L'(u),  v)  =  0  for  all  u  and  v,  in  particular  for  v  =  L{u)  —  L'{u)\  thus 
|| L(u)  —  L’(u)\\2  =  0  and  L(u )  =  L'(u )  for  all  u. 
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Proposition  3.13.  Let  L  :  V  — >  V  be  a  linear  map  on  the  finite-dimensional 
inner-product  space  V.  For  each  u  in  V .  there  exists  a  unique  vector  L*(u)  in  V 
such  that 

( L(v ),  u )  =  (v,  L*(u ))  for  all  v  in  V . 

As  u  varies,  this  formula  defines  L*  as  a  linear  map  from  V  to  V. 

Remark.  The  linear  map  L*  :  V  — >■  V  is  called  the  adjoint  of  L. 

PROOF.  The  function  v  (L( v),  it)  is  a  linear  functional  on  V,  and  Theorem 
3.12  shows  that  it  is  given  by  the  inner  product  with  a  unique  vector  of  V.  Thus 
we  define  L*(u)  to  be  the  unique  vector  of  V  with  (L(v),  u)  =  (v,  L*(u))  for  all 
v  in  V. 

If  c  is  a  scalar,  then  the  uniqueness  and  the  computation  (v,  L*(cu))  = 
( L(v ),  cu )  =  c(L(v),  it)  =  c( v,  L*(u))  =  (v,  cL*{u))  yield  L*(cu)  =  cL*(u). 
Similarly  the  uniqueness  and  the  computation 

(v,  L*(u\  +  u2))  =  (L(v),  u\  +  u2)  =  (L(v),  mi)  +  (L(v),  u2) 

=  (■ v ,  L\u{))  +  ( v ,  L*(u2))  =  (■ v ,  L*( Mi)  +  L*(u2)) 
yield  L*(u\  +  u2)  =  L*(u\)  +  L*(u2).  Therefore  L*  is  linear.  □ 


ThepassageL  i->  L*  to  the  adjoint  is  a  function  from  Hom-AV,  V)  to  itself  that 
is  conjugate  linear,  and  it  reverses  the  order  of  multiplication:  (L\L2)*  =  L*L*. 
Since  the  formula  (L(v),  u)  =  ( v ,  L*(u))  in  the  proposition  is  equivalent  to  the 
formula  (u,  L{y))  =  (L*(m),  u),  we  see  that  L**  =  L. 

All  of  the  results  in  Section  II.  3  concerning  the  association  of  matrices  to  linear 
maps  are  applicable  here,  but  our  interest  now  will  be  in  what  happens  when  the 
bases  we  use  are  orthonormal.  Recall  from  Section  II. 3  that  if  T  =  (u i, . . . ,  un) 


and  A  =  (iq, . . . ,  vn)  are  any  ordered  bases  of  V,  then  the  matrix  A  = 


L 

Ar 


associated  to  the  linear  map  L  :  V  — >  V  has  A  ,7 


L(uj) 

A 


Lemma  3.14.  If  L  :  V  —>  V  is  a  linear  map  on  the  finite-dimensional  inner- 
product  space  V  and  if  T  =  (u\, . . . ,  u„)  and  A  =  (tq, . . . ,  v„)  are  ordered 

L 


orthonormal  bases  of  V,  then  the  the  matrix  A  = 
PROOF.  Applying  Theorem  3. 1  lc,  we  have 

Atj  = 


AT 


has  Aji  =  ( L(u.j ),  v^. 


L(Uj) 

A 


Er  ( L(uj )>  vi')vi 

A 


=  (L(«i)-  Vy)  (V^\  =Y  (L(Uj)’  V^S"'  =  vi )•  D 

V  V  /  i  V 
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Proposition  3.15.  If  L  :  V  — »■  V  is  a  linear  map  on  the  finite-dimensional 
inner-product  space  V  and  if  T  =  (u i, ,  un)  and  A  =  (iq, . . . ,  v„)  are  ordered 


orthonormal  bases  of  V ,  then  the  matrices  A  = 


L 


and  A*  = 


FA 


of  L 


and  its  adjoint  are  related  by  A*  = 


A; 


PROOF.  Lemma  3.14  and  the  definition  of  L*  give  A*.  =  ( L*(vj ),  u , )  = 


(vj,  L(ui ))  =  (L(ui),  vj )  =  Aji. 


□ 


Accordingly,  we  define  A*  =  A  *  for  any  square  matrix  A,  sometimes  calling 
A*  the  adjoint6  of  A. 

A  linear  map  L  :  V  — »■  V  is  called  self-adjoint  if  L*  =  L.  Correspondingly  a 
square  matrix  A  is  self-adjoint  if  A*  =  A.  It  is  more  common,  however,  to  say 
that  a  matrix  with  A*  =  A  is  symmetric  if  F  =  M  or  Hermitian7  if  F  =  C.  A 
real  Hermitian  matrix  is  symmetric,  and  the  term  “Hermitian”  is  thus  applicable 
also  when  F  =  R. 

Any  Hermitian  matrix  A  arises  from  a  self-adjoint  linear  map  L.  Namely, 
we  take  V  to  be  F"  with  the  usual  inner  product,  and  we  let  F  and  A  each  be 
the  standard  ordered  basis  E  =  (e\.  ....  <?„  ).  This  basis  is  orthonormal,  and  we 
define  L  by  the  matrix  product  L(v)  =  Av  for  any  column  vector  v.  We  know  that 

=  A.  Since  A*  =  A,  we  conclude  from  Proposition  3.15  that  L*  =  L. 

Thus  we  are  free  to  deduce  properties  of  Hermitian  matrices  from  properties  of 
self-adjoint  linear  maps. 

Self-adjoint  linear  maps  will  be  of  special  interest  to  us.  Nontrivial  examples 
of  self-adjoint  linear  maps,  constructed  without  simply  writing  down  Hermitian 
matrices,  may  be  produced  by  the  following  proposition. 


Proposition  3.16.  If  V  is  a  finite-dimensional  inner-product  space  and  S'  is  a 
vector  subspace  of  V,  then  the  orthogonal  projection  E  :  V  — >■  V  of  V  on  S  is 
self-adjoint. 

PROOF.  Let  v  =  Ui+U2and  u  =  u  i  +//2  be  the  decompositions  of  two  members 
of  V  according  to  V  =  S  ©  S-1.  Then  we  have  (v,  E*(u))  =  ( E(v ),  u )  = 
(vi,u\  +  u2)  =  (iq,  ii\)  =  ( v ,  u i)  =  (v,  E(u)),  and  the  proposition  follows  by 
the  uniqueness  in  Proposition  3.13.  □ 


6The  name  “adjoint”  happens  to  coincide  with  the  name  for  a  different  notion  that  arose  in 
connection  with  Cramer’s  rule  in  Section  II.7.  The  two  notions  never  seem  to  arise  at  the  same  time, 
and  thus  no  confusion  need  occur. 

7The  term  “Flermitian”  is  used  also  for  a  class  of  linear  maps  in  the  infinite-dimensional  case, 
but  care  is  needed  because  the  terms  “Flermitian”  and  “self-adjoint"  mean  different  things  in  the 
infinite-dimensional  case. 
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To  understand  Proposition  3.16  in  terms  of  matrices,  take  an  ordered  or¬ 
thonormal  basis  (u\, . . . ,  ur)  of  S,  and  extend  it  to  an  ordered  orthonormal  basis 
T  =  un)  of  V.  Then 


E(uj)  = 


0 


for  j  <  r, 
for  j  >  r, 


and  hence  ^  ^(uj)  J  CqUa]s  t|lc  /lh  standard  basis  vector  e;  if  j  <  r  and  equals  0  if 

j  >  r .  Consequently  the  matrix  ^  ^  ^  is  diagonal  with  l's  in  the  first  r  diagonal 

entries  and  0's  elsewhere.  This  matrix  is  equal  to  its  conjugate  transpose,  as  it 
must  be  according  to  Propositions  3.15  and  3.16. 


Proposition  3.17.  If  V  is  a  finite-dimensional  inner-product  space  and 
L  :  V  — >  V  is  a  self-adjoint  linear  map,  then  (L( v),  v)  is  in  R  for  every  v 
in  V ,  and  consequently  every  eigenvalue  of  L  is  in  R.  Conversely  if  F  =  C  and 
if  L  :  V  —>■  V  is  a  linear  map  such  that  ( L(v).  v )  is  in  R  for  every  v  in  V ,  then  L 
is  self-adjoint. 

Remark.  The  hypothesis  F  =  C  is  essential  in  the  converse.  In  fact,  the  90° 
rotation  L  of  R2  whose  matrix  in  the  standard  basis  is  ^  ^  j  is  not  self-adjoint 

but  does  have  L(v)  ■  v  =  0  for  every  v  in  R2. 

PROOF.  If  L  =  L*,  then  ( L(v ),  v)  =  (v,  L*(v))  =  (v,  L(v ))  =  ( L(v ),  v), 
and  hence  ( L(v ),  v)  is  real-valued.  If  v  is  an  eigenvector  with  eigenvalue  A,  then 
substitution  of  L(v)  =  Xv  into  (L(v),v)  =  (L(v),v)  gives  A||n||2  =  A||i>||2. 
Since  w^O.A  must  be  real. 

For  the  converse  we  begin  with  the  special  case  that  ( L(w ),  w)  =  0  for  all  w. 
For  0  <  k  <  3,  we  then  have 

0 -i)k(L(u ),  v)+ik(L(v),  u )  =  ( L(u+ikv ),  u+ikv)-(L(u),  u)-(L(v),  v )  =  0. 


Taking  k  =  0  gives  ( L(u ),  v)  +  (L( v),  it)  =  0,  while  taking  k  =  1  gives 
( L(u ),  v )  —  (L(u),  u)  =  0.  Hence  (L(n),  v)  =  0  for  all  u  and  v.  Since  the 
function  (u,  v)  i L(u,  v)  determines  L,  we  obtain  L  =  0. 

In  the  general  case,  (L(v),  v)  real-valued  implies  that  ( L ( v ) ,  v)  =  ( L*(v).  v) 
for  all  v.  Therefore  ((L  —  L*) (v).  v)  =  0  for  all  v,  and  the  special  case  shows 
that  L  —  L*  =  0.  This  completes  the  proof.  □ 

We  conclude  this  section  by  examining  one  further  class  of  linear  maps  having 
a  special  relationship  with  their  adjoints. 
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Proposition  3.18.  If  V  is  a  finite-dimensional  inner-product  space,  then  the 
following  conditions  on  a  linear  map  L  :  V  —>■  V  are  equivalent: 

(a)  L*L  =  7, 

(b)  L  carries  some  orthonormal  basis  of  V  to  an  orthonormal  basis, 

(c)  L  carries  each  orthonormal  basis  of  V  to  an  orthonormal  basis, 

(d)  (L(u),  L(v))  =  (u,  v)  for  all  u  and  v  in  V, 

(e)  ||L(u)||  =  ||i>||  for  all  v  in  V. 

Remark.  A  linear  map  satisfying  these  equivalent  conditions  is  said  to  be 
orthogonal  if  IF  =  R  and  unitary  if  IF  =  C. 

PROOF.  We  prove  that  (a),  (d),  and  (e)  are  equivalent  and  that  (b),  (c),  and  (d) 
are  equivalent. 

If  (a)  holds  and  u  and  v  are  given  in  V,  then  (L(u),  L(v))  =  (L*L(u),  v)  = 
(I(u),  v)  =  (n,  v ),  and  (d)  holds.  If  (d)  holds,  then  setting  u  =  v  shows  that  (e) 
holds.  If  (e)  holds,  we  use  polarization  twice  to  write 

(L(u),L(v))  =  Ea-  \ik\\L(u)  +  ikL(v)\\2  =  J2k  \ik\\L(u  +  ikv)\\2 
=  Hk  \ik\\u  +  ikv\\2  =  (u,v). 

Then  ((L*L  —  v)  =  0  for  all  u  and  v,  and  we  conclude  that  (a)  holds. 

Since  (b)  is  a  special  case  of  (c)  and  (c)  is  a  special  case  of  (d),  proving  that  (b) 
implies  (d)  will  prove  that  (b),  (c),  and  (d)  are  equivalent.  Thus  let  {u i , . . . ,  u,,} 
be  an  orthonormal  basis  of  V  such  that  j L ( it i ) , . . . ,  Liu,,)}  is  an  orthonormal 
basis,  and  let  u  and  v  be  given.  Then 

(L(u),  L(v))  =  (L(  J2i  (u,  Ui)Ui),  L(  J2j  (v,  uj)uj )) 

=  J2i,j  (M>  Ui)(v,  Uj)(L(Ui),  L(uj)) 

=  E/,;  («.  Ui)(v,  Uj)Sij  =  E,  (k.  Ui)(v,  Ui )  =  (i U ,  V), 

the  last  equality  following  from  Parseval's  equality  (Theorem  3.11).  □ 

As  with  self-adjointness,  we  use  the  geometrically  meaningful  definition  for 
linear  maps  to  obtain  a  definition  for  matrices:  a  square  matrix  A  with  A*  A  =  I 
is  said  to  be  orthogonal  if  F  =  R  and  unitary  if  F  =  C.  The  condition  is  that 
A  is  invertible  and  its  inverse  equals  its  adjoint.  In  terms  of  individual  entries, 
the  condition  is  that  E*  A*kAkj  =  Sjj ,  hence  that  E*  =  &ij ■  This  is  the 

condition  that  the  columns  of  A  form  an  orthonormal  basis  relative  to  the  usual 
inner  product  on  R"  or  C".  A  real  unitary  matrix  is  orthogonal. 

If  A  is  an  orthogonal  or  unitary  matrix,  we  can  construct  a  corresponding 
orthogonal  or  unitary  linear  map  on  R"  or  C"  relative  to  the  standard  ordered 
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basis  E.  Namely,  we  define  L( v)  =  Av,  and  Proposition  3.15  shows  that  L  is 
orthogonal  or  unitary:  L*L(v)  =  A*Av  =  I v  =  v.  Proposition  3.19  below 
gives  a  converse. 

Let  us  notice  that  an  orthogonal  or  unitary  matrix  A  necessarily  has  |  det  A  \  =  1 . 
In  fact,  the  formula  A*  =  ( A)r  implies  that  det  A*  =  det  A.  Then 

1  =  det  7  =  det  A*  A  =  det  A*  det  A  =  det  A  det  A  =  |  det  A|2. 

An  orthogonal  matrix  thus  has  determinant  ±1,  while  we  conclude  for  a  unitary 
matrix  only  that  the  determinant  is  a  complex  number  of  absolute  value  1 . 

Examples. 

(1)  The  2-by-2  orthogonal  matrices  of  determinant  + 1  are  all  matrices  of  the 
form  (  msl'  sin!V  The  2-by-2  orthogonal  matrices  of  determinant  —1  are  the 

\  —  sin  0  cos  6  J  J  ° 

product  of  ^  ^  j  and  the  2-by-2  orthogonal  matrices  of  determinant  +1. 

(2)  The  2-by-2  unitary  matrices  of  determinant  + 1  are  all  matrices  of  the  form 
f  ^  with  |  a  | 2  +  [  /3  [ 2  =  1;  these  may  be  regarded  as  parametrizing  the  points  of 
the  unit  sphere  S3  of  R4.  The  2-by-2  unitary  matrices  of  arbitrary  determinant  are 
the  products  of  all  matrices  f 3  ^  and  the  2-by-2  unitary  matrices  of  determinant 
+  1. 


Proposition  3.19.  If  V  is  a  finite-dimensional  inner-product  space,  if  T  = 
(iti,...,  un)  and  A  =  (u i , . . . ,  vn )  are  ordered  orthonormal  bases  of  V,  and  if 
L  :  V  — >  V  is  a  linear  map  that  is  orthogonal  if  IF  =  R  and  unitary  if  F  =  C, 


then  the  matrix  A  = 


L 

AT 


is  orthogonal  or  unitary. 


Proof.  Proposition  3.15  and  Theorem  2.16  give  A*  A  = 


L* 

TA 


I 

AA 


and  the  right  side  is  the  identity  matrix,  as  required. 


□ 


One  consequence  of  Proposition  3.19  is  that  any  matrix 


I 

AT 


relative  to  two 


ordered  orthonormal  bases  is  orthogonal  or  unitary,  since  the  identity  function 
I  :  V  ^  V  is  certainly  orthogonal  or  unitary.  Thus  a  change  from  writing  the 
matrix  of  a  linear  map  L  in  one  ordered  orthonormal  basis  T  to  writing  the  matrix 
of  L  in  another  ordered  orthonormal  basis  A  is  implemented  by  the  formula 


L 

rr 


=  c 


-i 


L 

AA 


C,  where  C  is  the  orthogonal  or  unitary  matrix 


7 

AT 
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Another  consequence  of  Proposition  3.19  is  that  the  matrix  of  an 

orthogonal  or  unitary  linear  map  L  in  an  ordered  orthonormal  basis  T  is  an 
orthogonal  or  unitary  matrix.  We  have  defined  det  L  to  be  the  determinant  of 

relative  to  any  F,  and  we  conclude  that  |  detL|  =  1. 


3.  Spectral  Theorem 

In  this  section  we  deal  with  the  geometric  structure  of  certain  kinds  of  linear  maps 
from  finite-dimensional  inner-product  spaces  into  themselves.  We  shall  see  that 
linear  maps  that  are  self-adjoint  or  unitary,  among  other  possible  conditions,  have 
bases  of  eigenvectors  in  the  sense  of  Section  II. 8.  Moreover,  such  a  basis  may 
be  taken  to  be  orthonormal.  When  an  ordered  basis  of  eigenvectors  is  used  for 
expressing  the  linear  map  as  a  matrix,  the  result  is  that  the  matrix  is  diagonal. 
Thus  these  linear  maps  have  an  especially  uncomplicated  structure.  In  terms  of 
matrices,  the  result  is  that  a  Hermitian  or  unitary  matrix  A  is  similar  to  a  diagonal 
matrix  D,  and  the  matrix  C  with  D  =  C~1AC  may  be  taken  to  be  unitary.  We 
begin  with  a  lemma. 

Lemma  3.20.  If  L  :  V  — »■  V  is  a  self-adjoint  linear  map  on  an  inner- 
product  space  V,  then  v  r->  (L(v),  v)  is  real-valued,  every  eigenvalue  of  L  is 
real,  eigenvectors  under  L  for  distinct  eigenvalues  are  orthogonal,  and  every 
vector  subspace  S  of  V  with  L(S)  C  S  has  Z.(5,J~)  C  .S'1 . 

PROOF.  The  first  two  conclusions  are  contained  in  Proposition  3.17.  If  v\  and 
i>2  are  eigenvectors  of  L  with  distinct  real  eigenvalues  A  i  and  A2,  then 

(*t  -  k2)(fi,  v2)  =  (A-iUi,  v2)  -  (vuX2v2)  =  (L(v i),  v2)  -  (iq,  L(t>2))  =  0. 

Since  Ai  ^  X2,  we  must  have  ( v i ,  v2)  =  0.  If  5  is  a  vector  subspace  with 
L(S )  c  S,  then  also  LiS^)  c  S ±  because  s  e  S  and  e  S -1  together  imply 

0  =  (LCs),^-1)  =  (5,  Lis-1)).  □ 

Theorem  3.21  (Spectral  Theorem).  Let  L  :  V  —>■  V  be  a  self-adjoint  linear 
map  on  an  inner-product  space  V.  Then  V  has  an  orthonormal  basis  of  eigenvec¬ 
tors  of  L.  In  addition,  for  each  scalar  A,  let 

Vx  =  {v  e  V  \  L(v)  =  Xv}, 

so  that  V}  when  nonzero  is  the  eigenspace  of  L  for  the  eigenvalue  A.  Then  the 
eigenvalues  of  L  are  all  real,  the  vector  subspaces  V-,.  are  mutually  orthogonal, 
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and  any  orthonormal  basis  of  V  of  eigenvectors  of  L  is  the  union  of  orthonormal 
bases  of  the  V^’s.  Correspondingly  if  A  is  any  Hermitian  n-by-77  matrix,  then 
there  exists  a  unitary  matrix  C  such  that  C-1  AC  is  diagonal  with  real  entries.  If 
the  matrix  A  has  real  entries,  then  C  may  be  taken  to  be  an  orthogonal  matrix. 

PROOF.  Lemma  3.20  shows  that  the  eigenvalues  of  L  are  all  real  and  that  the 
vector  subspaces  V-k  are  mutually  orthogonal. 

To  proceed  further,  we  first  assume  that  F  =  C.  Applying  the  Fundamental 
Theorem  of  Algebra  (Theorem  1 . 1 8)  to  the  characteristic  polynomial  of  L ,  we  see 
that  L  has  at  least  one  eigenvalue,  say  A  1 .  Then  L ( V, , )  C  V, , ,  and  Lemma  3.20 
shows  that  L ( ( V, ) 1 )  C  (VA  )x.  The  vector  subspace  (VA] J1  is  an  inner-product 
space,  and  the  claim  is  that  L  |  )±  is  self-adjoint.  In  fact,  if  v\  and  v2  are  in 

(Va,)-1.  then 

((^|(V  )±)*(vi),  v2)  =  (vu  L|  ±(u2))  =  (vu  L(v 2)) 

=  (L(Wl),  v2)  =  (L|(V.  )±(ui),  v2), 

and  the  claim  is  proved.  Since  Aj  is  an  eigenvalue  of  L,  dim(  VA ) 1  <  dim  V . 
Therefore  we  can  now  set  up  an  induction  that  ultimately  exhibits  V  as  an  orthog¬ 
onal  direct  sum  V  =  Vkl  ©  ■  ■  ■  ©  Vkk .  If  v  is  an  eigenvector  of  L  with  eigenvalue 
A',  then  either  X'  =  A,  for  some  j  in  this  decomposition,  in  which  case  v  is  in 
Vij,  or  A'  is  not  equal  to  any  A j,  in  which  case  v,  by  the  lemma,  is  orthogonal 
to  all  vectors  in  Vk,  ©  ■  ■  ■  ©  V-Ak,  hence  to  all  vectors  in  V :  being  orthogonal  to 
all  vectors  in  V,  v  must  be  0.  Choosing  an  orthonormal  basis  for  each  V,_  and 
taking  their  union  provides  an  orthonormal  basis  of  eigenvectors  and  completes 
the  proof  for  L  when  F  =  C. 

Next  assume  that  A  is  a  Hermitian  77-by-«  matrix.  We  define  a  linear  map 
L  :  C"  — >■  C"  by  L( v)  =  Av.  and  we  know  from  Proposition  3.15  that  L  is  self- 
adjoint.  The  case  just  proved  shows  that  L  has  an  ordered  orthonormal  basis  F 
of  eigenvectors,  all  the  eigenvalues  being  real.  If  E  denotes  the  standard  ordered 

basis  of  C" ,  then  I)  =  ^  ^  ^  is  diagonal  with  real  entries  and  is  equal  to 

(r4)  (ee)  (sr)  =  C~'AC- 


where  C  = 


L 

sr 


The  matrix  C  is  unitary  by  Proposition  3.19,  and  the  formula 


D  =  C~l  AC  shows  that  A  is  as  asserted. 

Now  let  us  return  to  L  and  suppose  that  F  =  M.  The  idea  is  to  use  the 
same  argument  as  above  in  the  case  that  F  =  C,  but  we  need  a  substitute  for 
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the  use  of  the  Fundamental  Theorem  of  Algebra.  Fixing  any  orthonormal  basis 
of  V,  let  A  be  the  matrix  of  L.  Then  A  is  Hermitian  with  real  entries.  The 
previous  paragraph  shows  that  any  Hermitian  matrix,  whether  or  not  real,  has 
a  characteristic  polynomial  that  splits  as  a  product  —  rj)m>  with  all  rj 

real.  Consequently  L  has  this  property  as  well.  Thus  any  self-adjoint  L  when 
F  =  R  has  an  eigenvalue.  Returning  to  the  argument  for  L  above  when  F  =  C, 
we  readily  see  that  it  now  applies  when  F  =  R. 

Finally  if  A  is  a  Hermitian  matrix  with  real  entries,  then  we  can  define  a  self- 
adjoint  linear  map  L  :  R"  — >■  R"  by  L  ( v )  =  Av,  obtain  an  orthonormal  basis 
of  eigenvectors  for  L,  and  argue  as  above  to  obtain  D  =  C-1AC,  where  I)  is 
diagonal  and  C  is  unitary.  The  matrix  C  has  columns  that  are  eigenvectors  in  R" 
of  the  associated  L ,  and  these  have  real  entries.  Thus  C  is  orthogonal.  □ 


An  important  application  of  the  Spectral  Theorem  is  to  the  formation  of  a 
square  root  for  any  “positive  semidefinite”  linear  map.  We  say  that  a  linear  map 
L  :  V  —>■  V  on  a  finite-dimensional  inner-product  space  is  positive  semidefinite 
if  L*  =  L  and  (. L(v ),  v)  >  0  for  all  v  in  V.  If  F  =  C,  then  the  condition  L*  =  L 
is  redundant,  according  to  Proposition  3.17,  but  that  fact  will  not  be  important 
for  us.  Similarly  an  n-by-n  matrix  A  is  positive  semidefinite  if  A*  =  A  and 
x'  Ax  >  0  for  all  column  vectors  x.  An  example  of  a  positive  semidefinite  n-by-n 
matrix  is  any  matrix  A  =  B*  B,  where  B  is  an  arbitrary  k-by-n  matrix.  In  fact,  if 
x  is  in  F",  then  x‘  B*Bx  =  ( Bx)‘  ( Bx),  and  the  right  side  is  >  0,  being  a  sum  of 
absolute  values  squared. 


Corollary  3.22.  Let  L  :  V  —>■  V  be  a  positive  semidefinite  linear  map  on  a 
finite-dimensional  inner-product  space,  and  let  A  be  an  n-by-n  Hermitian  matrix. 
Then 

(a)  L  or  A  is  positive  semidefinite  if  and  only  if  all  of  its  eigenvalues  are  >  0. 

(b)  whenever  L  or  A  is  positive  semidefinite,  L  or  A  is  invertible  if  and  only 
if  (L( v),  v)  >  0  for  all  v  /  0  or  x‘Ax  >  0  for  all  x  ^  0. 

(c)  whenever  L  or  A  is  positive  semidefinite,  L  or  A  has  a  unique  positive 
semidefinite  square  root. 

Remarks.  A  positive  semidefinite  linear  map  or  matrix  satisfying  the  condi¬ 
tion  in  (b)  is  said  to  be  positive  definite,  and  the  content  of  (b)  is  that  a  positive 
semidefinite  linear  map  or  matrix  is  positive  definite  if  and  only  if  it  is  invertible. 

PROOF.  We  apply  the  Spectral  Theorem  (Theorem  3.21).  For  each  conclusion 
the  result  for  a  matrix  A  is  a  special  case  of  the  result  for  the  linear  map  L,  and 
it  is  enough  to  treat  only  L.  In  (a),  let  {u\, . . . ,  un)  be  an  ordered  basis  of  eigen- 
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vectors  with  respective  eigenvalues  A-i ,  . . . .  Xn ,  not  necessarily  distinct.  Then 
Uj )  =  Xj  shows  the  necessity  of  having  Xj  >  0,  while  the  computation 

( L(v ),  u)  =  (L(  E;  (v,  Ey  ( v >  Uj)uj ) 

=  (E;  *;(»>>  ui)ui,  J2j  (v>  uj)uj) 

shows  the  sufficiency. 

In  (b),  if  L  fails  to  be  invertible,  then  0  is  an  eigenvalue  for  some  eigenvector 
b  /  0,  and  v  has  ( L ( v ) .  v)  =  0.  Conversely  if  L  is  invertible,  then  all  the 
eigenvalues  A,  are  >  0  by  (a),  and  the  computation  in  (a)  yields 

(L(u),  v)  =  ^A,|(u,  Ui) I2  >  (min A;)  ^  |(u,  «,-) |2  =  (minA/)||i;||2, 

i  1  i  1 

the  last  step  following  from  Parseval’s  equality  (Theorem  3.11). 

For  existence  in  (c),  the  Spectral  Theorem  says  that  there  exists  an  ordered 
orthonormal  basis  T  =  («i,  of  eigenvectors  of  L,  say  with  respective 

eigenvalues  X\, ...  ,Xn.  The  eigenvalues  are  all  >  0  by  (a).  The  linear  extension 
of  the  function  P  with  P  ( uj )  =  X,  uj  is  given  by 

n 

P(v)  =  yx)/2(v,  Uj)uj, 

7  =  1 

and  it  has 

P2(v)  =  xj(v>  uj)uj  =  Uj  (v’  Uj)L(uj)  =  L(  J2j  (v,  Uj)Uj )  =  L(v). 
Thus  P2  =  L.  Relative  to  T,  we  have 

=  {(P(Uj),  Ui)ux  H - b  (P(Uj),  un)un).  =  (P(uj),  Ui)  =  x]/2Sij, 

ij 

and  this  is  a  Hermitian  matrix;  Proposition  3.15  therefore  shows  that  P*  =  P. 
Finally 

(P(v),  v )  =  (E;  y2(v,  ut)ui,  Ey  (v,  Uj)uj )  =  a‘/2|(u,  Uj) |2  >  0, 

and  thus  P  is  positive  semidehnite.  This  proves  existence. 

For  uniqueness  in  (c),  let  P  satisfy  P*  =  P  and  P2  =  L,  and  suppose  P  is 
positive  semidehnite.  Choose  an  orthonormal  basis  of  eigenvectors  u  \ ,  . . . ,  un 
of  P,  say  with  eigenvalues  c i, . . . ,  c„,  all  >  0.  Then  L(uj)  =  P2(iij)  =  c2Uj, 
and  we  see  that  u\, ...  ,un  form  an  orthonormal  basis  of  eigenvectors  of  L  with 
eigenvalues  c2.  On  the  space  where  L  acts  as  the  scalar  A, ,  P  must  therefore  act 
1  /2 

as  the  scalar  A(.  .  We  conclude  that  P  is  unique.  □ 
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The  technique  of  proof  of  (c)  allows  one,  more  generally,  to  define  f(L)  for 
any  function  /  :  M  — »■  C  whenever  L  is  self-adjoint.  Actually,  the  function  / 
needs  to  be  defined  only  on  the  set  of  eigenvalues  of  L  for  the  definition  to  make 
sense. 

At  the  end  of  this  section,  we  shall  use  the  existence  of  the  square  root  in  (c)  to 
obtain  the  so-called  "polar  decomposition”  of  square  matrices.  But  before  doing 
that,  let  us  mine  three  additional  easy  consequences  of  the  Spectral  Theorem. 
The  first  deals  with  several  self-adjoint  linear  maps  rather  than  one,  and  the  other 
two  apply  that  conclusion  to  deal  with  single  linear  maps  that  are  not  necessarily 
self-adjoint. 

Corollary  3.23.  Let  V  be  a  finite-dimensional  inner-product  space,  and  let 
L  i ,  . . . ,  Lm  be  self-adjoint  linear  maps  from  V  to  V  that  commute  in  the  sense  that 
LjLj  =  LjLj  for  all  i  and  j.  Then  V  has  an  orthonormal  basis  of  simultaneous 
eigenvectors  of  L  i ,  . . . ,  Lm .  In  addition,  for  each  m  -tuple  of  scalars  X  i ,  . . . ,  Xm , 
let 

Viu...,xm  =  {v  G  V  |  Lj(v)  =  Xjv  for  1  <  j  <  m) 

consist  of  0  and  the  simultaneous  eigenvectors  of  L\ , . . . ,  Lm  corresponding  to 

Xi ,  . . . ,  Xm .  Then  all  the  eigenvalues  Xj  are  real,  the  vector  subspaces  VAI . \m 

are  mutually  orthogonal,  and  any  orthonormal  basis  of  V  of  simultaneous  eigen¬ 
vectors  of  L\, . . . ,  L,„  is  the  union  of  orthonormal  bases  of  the  V,_, . xm ’s.  Corre¬ 

spondingly  if  A\, . . . ,  Am  are  commuting  Hermitian  ri-by-n  matrices,  then  there 
exists  a  unitary  matrix  C  such  that  C~l  A jC  is  diagonal  with  real  entries  for  all  j. 
If  all  the  matrices  A,  have  real  entries,  then  C  may  be  taken  to  be  an  orthogonal 
matrix. 

PROOF.  This  follows  by  iterating  the  Spectral  Theorem  (Theorem  3.21).  In 
fact,  let  { Vxj }  be  the  system  of  vector  subspaces  produced  by  the  theorem  for  L  i . 
For  each  j,  the  commutativity  of  the  linear  maps  L,  forces 


Li(Li(v))  =  L-,(L i(u))  =  Li(Xiv)  =  X\Li(v )  for  v  €  Vx, , 

and  thus  Li(Vk] )  C  V, , .  The  restrictions  of  L{, ,  Lm  to  V,  are  self-adjoint 
and  commute.  Let  { V;  ,  ;.,}  be  the  system  of  vector  subspaces  produced  by  the 
Spectral  Theorem  for  Li\y  .  Each  of  these,  by  the  commutativity,  is  carried 
into  itself  by  L3, . . . ,  Lm,  and  the  restrictions  of  Lx, ,  Lm  to  V\u\2  form  a 
commuting  family  of  self-adjoint  linear  maps.  Continuing  in  this  way,  we  arrive 
at  the  decomposition  asserted  by  the  corollary  for  L\ , . . . ,  Lm .  The  assertion  of 
the  corollary  about  commuting  Hermitian  matrices  is  a  special  case,  in  the  same 
way  that  the  assertions  in  Theorem  3.21  about  matrices  were  special  cases  of  the 
assertions  about  linear  maps.  □ 


no 
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A  linear  map  L  :  V  — >■  V,  not  necessarily  self-adjoint,  is  said  to  be  normal  if 
L  commutes  with  its  adjoint:  LL*  =  L*L. 

Corollary  3.24.  Suppose  that  IF  =  C,  and  let  L  :  V  — >■  V  be  a  normal  linear 
map  on  the  finite-dimensional  inner-product  space  V.  Then  V  has  an  orthonormal 
basis  of  eigenvectors  of  L.  In  addition,  for  each  complex  scalar  k,  let 

14  =  {u  e  V  I  L(v)  =  kv}, 

so  that  V;  when  nonzero  is  the  eigenspace  of  L  for  the  eigenvalue  k.  Then  the 
vector  subspaces  V-,  are  mutually  orthogonal,  and  any  orthonormal  basis  of  V  of 
eigenvectors  of  L  is  the  union  of  orthonormal  bases  of  the  V>  ’s.  Correspondingly 
if  A  is  any  n-by-n  complex  matrix  such  that  A  A*  =  A*  A,  then  there  exists  a 
unitary  matrix  C  such  that  C-1  AC  is  diagonal. 

Remark.  The  corollary  fails  if  ¥  =  R:  for  the  linear  map  L  :  R2  — >■  R2 
with  L(v )  =  Av  and  A  =  ^  L*  =  L~x  commutes  with  L,  but  L  has  no 

eigenvectors  in  R2  since  the  characteristic  polynomial  k2  +  1  has  no  first-degree 
factors  with  real  coefficients. 

PROOF.  The  point  is  that  L  =  Q(L  +  L*))-H'(^t(L  — L*))  and  that  \(L  +  L*) 
and  —  L*)  are  self-adjoint.  If  L  commutes  with  L*,  then  7j  =  \{L  +  L*) 
and  7F  =  jj(L  —  L*)  commute  with  each  other.  We  apply  Corollary  3.23  to 
the  commuting  self-adjoint  linear  maps  7j  and  Tj.  The  vector  subspace  Va,p 
produced  by  Corollary  3.23  coincides  with  the  vector  subspace  Va+ip  defined  in 
the  present  corollary,  and  the  result  for  L  follows.  The  result  for  matrices  is  a 
special  case.  □ 

Corollary  3.25.  Suppose  that  IF  =  C,  and  let  L  :  V  —>■  V  be  a  unitary  linear 
map  on  the  finite-dimensional  inner-product  space  V.  Then  V  has  an  orthonormal 
basis  of  eigenvectors  of  L.  In  addition,  for  each  complex  scalar  k,  let 

14  =  [v  €  V  |  L(v)  =  kv}, 

so  that  V\  when  nonzero  is  the  eigenspace  of  L  for  the  eigenvalue  k.  Then  the 
eigenvalues  of  L  all  have  absolute  value  1 ,  the  vector  subspaces  14  are  mutually 
orthogonal,  and  any  orthonormal  basis  of  V  of  eigenvectors  of  L  is  the  union 
of  orthonormal  bases  of  the  I4’s.  Correspondingly  if  A  is  any  77-by-n  unitary 
matrix,  then  there  exists  a  unitary  matrix  C  such  that  C~XAC  is  diagonal;  the 
diagonal  entries  of  C-1  AC  all  have  absolute  value  1 . 

PROOF.  This  is  a  special  case  of  Corollary  3.24  since  a  unitary  linear  map  L 
has  LL*  =  I  =  L*L.  The  eigenvalues  all  have  absolute  value  1  as  a  consequence 
of  Proposition  3. 1 8e.  □ 


3.  Spectral  Theorem 
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Now  we  come  to  the  polar  decomposition  of  linear  maps  and  of  matrices. 
When  F  =  C,  this  is  a  generalization  of  the  polar  decomposition  z  =  <?"'/'  of 
complex  numbers.  WhenF  =  R,  it  generalizes  the  decomposition  x  =  (sgn  x)\x\ 
of  real  numbers. 

Theorem  3.26  (polar  decomposition).  If  L  :  V  — >  V  is  a  linear  map  on  a 
finite-dimensional  inner-product  space,  then  L  decomposes  as  L  =  UP,  where 
P  is  positive  semidefinite  and  U  is  orthogonal  if  F  =  R  and  unitary  if  F  =  C. 
The  linear  map  P  is  unique,  and  U  is  unique  if  L  is  invertible.  Correspondingly 
any  n-by-n  matrix  A  decomposes  as  A  =  U P,  where  P  is  a  positive  semidefinite 
matrix  and  U  is  an  orthogonal  matrix  if  F  =  R  and  a  unitary  matrix  if  F  =  C. 
The  matrix  P  is  unique,  and  U  is  unique  if  A  is  invertible. 

Remarks.  As  we  have  already  seen  in  other  situations,  the  motivation  for  the 
proof  comes  from  the  uniqueness. 

PROOF  of  UNIQUENESS.  Let  L  =  UP  =  U'P'.  Then  L*L  =  P2  =  P'2.  The 
linear  map  L*L  is  positive  semidefinite  since  its  adjoint  is  (L*L)*  =  L*L**  = 
L*L  and  since  ( L*L(v ),  v)  =  ( L(v ),  L(v))  >  0.  Therefore  Corollary  3.22c 
shows  that  L*L  has  a  unique  positive  semidefinite  square  root.  Hence  P  =  P' . 
If  L  is  invertible,  then  P  is  invertible  and  L  =  U P  implies  that  U  =  LP~X .  The 
same  argument  applies  in  the  case  of  matrices.  □ 

Proof  of  existence.  If  L  is  given,  then  we  have  just  seen  that  L*L  is 
positive  semidefinite.  Let  P  be  its  unique  positive  semidefinite  square  root.  The 
proof  is  clearer  when  L  is  invertible,  and  we  consider  that  case  first.  Then  we 
can  set  U  =  LP~\  Since  U*  =  ( P~l)*L *  =  P~XL*,  we  find  that  U*U  = 
P~l L* LP~X  =  p~xp2p~x  =  /,  and  we  conclude  that  U  is  unitary. 

When  L  is  not  necessarily  invertible,  we  argue  a  little  differently  with  the 
positive  semidefinite  square  root  P  of  L*L.  The  kernel  K  of  P  is  the  0  eigenspace 
of  P,  and  the  Spectral  Theorem  (Theorem  3.21)  shows  that  the  image  of  P  is  the 
sum  of  all  the  other  eigenspaces  and  is  just  K  J~.  Since  K  D  K  l  =  0,  P  is  one-one 
from  K1  onto  itself.  Thus  P ( v )  t->  L(v)  is  a  one-one  linear  map  from  K1  into 
V.  Call  this  function  U,  so  that  U {P(v))  =  L{ v).  For  any  v\  and  im  in  V,  we 
have 


(L(v i),  L(v2))  =  (L*L(Vl),  v2)  =  (P2(m),  v2)  =  (P( m),  P(v2)),  (*) 

and  hence  U  :  K1  ->  V  preserves  inner  products.  Let  {u i, . . . ,  tti  \  be  an 
orthonormal  basis  of  KL ,  and  let  {iik+ 1 , . . . ,  un}  be  an  orthonormal  basis  of 
K.  Since  U  preserves  inner  products  and  is  linear,  { U  ( it  i ) , . . .  .U  (ui:)}  is  an 
orthonormal  basis  of  U (K±).  Extend  \U (u f), . . . ,  U (m*)}  to  an  orthonormal 
basis  of  V  by  adjoining  vectors  Vk+i, . . . ,  vn,  define  U (Uj)  =  Vj  for  k  +  1  < 
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j  <  n,  and  write  U  also  for  the  linear  extension  to  all  of  V.  Since  U  carries  one 
orthonormal  basis  {u  ,  u„]  of  V  to  another,  U  is  unitary.  We  have  U  P  =  L 
on  K1 ,  and  equation  (*)  with  v\  =  ih  shows  that  ker  L  =  ker  P  =  K.  Therefore 
U  P  =  L  everywhere.  □ 


4.  Problems 

1.  Let  V  —  Mnn  (C),  and  define  an  inner  product  on  V  by  (A,  B)  =  Tr (B*  A).  The 
norm  ||  ■  ||HS  obtained  from  this  inner  product  is  called  the  Hilbert-Schmidt 
norm  of  the  matrix  in  question. 

(a)  Prove  that  ||  A||j^s  =  JT  •  \Ajj\2  for  A  in  V. 

(b)  Let  Ejj  be  the  matrix  that  is  1  in  the  (/',  j)th  entry  and  is  0  elsewhere.  Prove 
that  the  set  of  all  Ejj  is  an  orthonormal  basis  of  V. 

(c)  Interpret  (a)  in  the  light  of  (b). 

(d)  Prove  that  the  Hilbert-Schmidt  norm  is  given  on  any  matrix  A  in  V  by 

II^IIhS  =  Hj  ll^M/l|2  —  Hi  J  \v*AUj\2, 

where  {u\,  ...,«„}  and  {vi, . . . ,  v„}  are  any  orthonormal  bases  of  C"  and 
i>*  refers  to  the  conjugate  transpose  of  any  member  v  of  C". 

(e)  Let  W  be  the  vector  subspace  of  all  diagonal  matrices  in  V.  Describe 
explicitly  the  orthogonal  complement  W_L,  and  find  its  dimension. 

2.  Let  V„  be  the  inner-product  space  over  Hi  of  all  polynomials  on  [0,  1]  of  degree 
<  n  with  real  coefficients.  (The  0  polynomial  is  to  be  included.)  The  Riesz 
Representation  Theorem  says  that  there  is  a  unique  polynomial  pn  such  that 
/(^)  =  /J  f(x)pn(x )  dx  for  all  /  in  V„.  Set  up  a  system  of  linear  equations 
whose  solution  tells  what  pn  is. 

3.  Let  V  be  a  finite-dimensional  inner-product  space,  and  suppose  that  L  and  M 
are  self-adjoint  linear  maps  from  V  to  V.  Show  that  LM  is  self-adjoint  if  and 
only  if  LM  =  ML. 

4.  Let  V  be  a  finite-dimensional  inner- product  space.  If  L  :  V  — »•  V  is  a  linear  map 
with  adjoint  L*,  prove  that  ker  L  —  (image  L*)1 . 

5 .  Find  all  2-by-2  Hermitian  matrices  A  with  characteristic  polynomial  a2  +  4a  +  6. 

6.  Let  Vj  and  1A  be  finite-dimensional  inner-product  spaces  over  the  same  F,  the 
inner  products  being  ( • ,  •  )i  and  ( • ,  •  )2- 

(a)  Using  the  case  when  Vj  =  VS  as  a  model,  define  the  adjoint  of  a  linear 
map  L  :  Vj  — »•  VS,  proving  its  existence.  The  adjoint  is  to  be  a  linear  map 
L*  :V2->  Vj. 
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(b)  If  T  is  an  orthonormal  basis  of  Vj  and  A  is  an  orthonormal  basis  of  Vi,  prove 
that  the  matrices  of  L  and  L*  in  these  bases  are  conjugate  transposes  of  one 
another. 

7.  Suppose  that  a  finite-dimensional  inner-product  space  V  is  a  direct  sum  V  — 
S  ©  T  of  vector  subspaces.  Let  E  :  V  — >  V  be  the  linear  map  that  is  the  identity 
on  S  and  is  0  on  T . 

(a)  Prove  that  V  =  S1  ®  T -L. 

(b)  Prove  that  E*  :  V  — »■  V  is  the  linear  map  that  is  the  identity  on  T1  and  is  0 
on  S'1. 

8.  (Iwasawa  decomposition)  Let  g  be  an  invertible  n-by-n  complex  matrix.  Apply 
the  Gram-Schmidt  orthogonalization  process  to  the  basis  {ge i , . . . ,  gen },  where 
{ei, . . . ,  e„]  is  the  standard  basis,  and  let  the  resulting  orthonormal  basis  be 
{jq,  . . . ,  v„}.  Define  an  invertible  n-by-n  matrix  k  such  that  k~lVj  —  ej  for 
1  <  j  <  n.  Prove  that  k  lg  is  upper  triangular  with  positive  diagonal  entries, 
and  conclude  that  g  —  k(k~l  g)  exhibits  g  as  the  product  of  a  unitary  matrix  and 
an  upper  triangular  matrix  whose  diagonal  entries  are  positive. 

9.  Let  A  be  an  n-by-n  positive  definite  matrix. 

(a)  Prove  that  det  A  >  0. 

(b)  Prove  for  any  subset  of  integers  1  <  i\  <  h  <  •  •  •  <  ik  <  n  that  the 
submatrix  of  A  built  from  rows  and  columns  indexed  by  (i\, ,  4)  is 
positive  definite. 

10.  Prove  that  if  A  is  a  positive  definite  n-by-n  matrix,  then  there  exists  an  n-by-n 
upper-triangular  matrix  B  with  positive  diagonal  entries  such  that  A  =  B*B. 

11.  The  most  general  2-by-2  Hermitian  matrix  is  of  the  form  A  =  ^a-  *  ^  with  a  and 
d  real  and  with  b  complex.  Find  a  diagonal  matrix  D  and  a  unitary  matrix  U 
such  that  D  =  f/-1  AC/. 

12.  In  the  previous  problem, 

(a)  what  conditions  on  A  make  A  positive  definite? 

(b)  when  A  is  positive  definite,  how  can  its  positive  definite  square  root  be 
computed  explicitly? 

13.  Prove  that  if  an  n-by-n  real  symmetric  matrix  A  has  v1  Av  —  0  for  all  v  in  R", 
then  A  =  0. 

14.  Let  L  :  C"  —>■  C"  be  a  self-adjoint  linear  map.  Show  for  each  x  e  C”  that  there 
is  some  y  e  C"  such  that  ( I  —  L)2(y)  =  (/  —  L)(x). 

15.  In  the  polar  decomposition  L  —  U P.  prove  that  if  P  and  U  commute,  then  L  is 
normal. 

16.  Let  V  be  an  n-dimensional  inner-product  space  over  R.  What  is  the  largest  pos¬ 
sible  dimension  of  a  commuting  family  of  self-adjoint  linear  maps  L  :  V  — >  V  ? 
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17.  Let  vi, ... ,  vn  be  an  ordered  list  of  vectors  in  an  inner-product  space.  The 

associated  Gram  matrix  is  the  Hermitian  matrix  of  inner  products  given 
by  G(v  i,...,vn)  —  [(Vi,Vj)],  and  detG(ui, . . . ,  vn)  is  called  its  Gram 
determinant.  c 

(a)  If  ci, ,  c„  are  in  C,  let  c  —  j  •  I.  Prove  that  c‘G(v i, ....  vn)c  = 

||cii)i  +  ■  •  •  +  cnvn\\2,  and  conclude  that  G(v\,  . . . ,  vn)  is  positive  semi- 
definite. 

(b)  Prove  that  det  G(v i, ...  ,vn)  >  0  with  equality  if  and  only  if  tq, . . . ,  vn  are 
linearly  dependent.  (This  generalizes  the  Schwarz  inequality.) 

(c)  Under  what  circumstances  does  equality  hold  in  the  Schwarz  inequality? 

Problems  18-23  introduce  the  Legendre  polynomials  and  establish  some  of  their 
elementary  properties,  including  their  orthogonality  under  the  inner  product  (P,  Q)  = 
f\  P  (x)Q  (x )  dx.  They  form  the  simplest  family  of  classical  orthogonal  polynomi¬ 
als.  They  are  uniquely  determined  by  the  conditions  that  the  nlh  one  Pn,  for  n  >  0, 
is  of  degree  n ,  they  are  orthogonal  under  ( • ,  • } ,  and  they  are  normalized  so  that 
Pn{  1)  =  1.  But  these  conditions  are  a  little  hard  to  work  with  initially,  and  instead 
we  adopt  the  recursive  definition  Pq(x)  =  1,  P\  (x )  =  x,  and 

(n  +  l)P„+i(x)  =  (2 n  +  1  )xPn(x)  —  nP„~ \{x)  for  n  >  1. 

18.  (a)  Prove  that  Pn(x)  has  degree  n,  that  Pn(—x)  =  (—  1)"  P„(x),  and  that  P„(l)  = 

1 .  In  particular,  P„  is  an  even  function  if  n  is  even  and  is  an  odd  function  if 
n  is  odd. 

(b)  Let  c<n>  be  the  constant  term  of  P„  if  n  is  even  and  the  coefficient  of  x  if  n 
is  odd,  so  that  c(0)  =  c(1)  =  1.  Prove  that  c(,i)  =  —  for  n  >  2. 

n  — 

19.  This  part  establishes  a  useful  concrete  formula  for  P„  (x).  Let  I)  =  d/dx  and 

X  =  x2  —  1 ,  writing  X'  —  2x,X"  =  2, and  X'"  —  0  for  the  derivatives.  Twoparts 
ofthis  problem  make  use  of  the  Leibniz  rule  Dn(fg)  =  Y^k=0  f)(Dk  g) 

for  higher-order  derivatives  of  a  product. 

(a)  Verify  that  D2(X"+l)  =  (2n  +  \)D{XnX')  -  n(2n  +  l)X"Xn  -  An2Xn~l. 

(b)  By  applying  D"~  1  to  the  result  of  (a)  and  rearranging  terms,  show  that 
D"+ i(X”+1)  =  (2 n  +  1  )X' Dn (Xn)  -  An2 Dn~\Xn~x). 

(c)  Put/?„(.r)  =  {2un\)~l  Dn(Xn)foxn  >  0.  Show  that  Rq(x)  =  l,Ri(x)  —  x, 
and  (n  +  l)/?„+i(.r)  =  (2n  +  l).r/?„(.v)  —  nRn-\{x)  forn  >  1. 

(d)  (Rodrigues’s  formula)  Conclude  that  2nnlPn(x)  —  {jk)"\(x2  —  1)"]. 

20.  Using  Rodrigues’s  formula  and  iterated  integration  by  parts,  prove  that 

Pm(x)P„(x)dx  =  0  form  <  n. 

Conclude  that  { /Jo ,  P\ ,  . . . ,  P„ )  is  an  orthogonal  basis  of  the  inner-product  space 
of  polynomials  on  [—1,  1]  of  degree  <  n  with  inner  product  ( • ,  • }. 
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21.  Arguing  as  in  the  previous  problem  and  taking  for  granted  that  l  (\—x2)"dx  = 

2Qn+\)\ -  Pr0Ve  that  Pn)  =  («  +  s)”1- 

22.  This  problem  shows  that  P„  (x)  satisfies  a  certain  second-order  differential  equa¬ 
tion.  Let  D  —  d/dx.  The  first  two  parts  of  this  problem  use  the  Leibniz  rule 
quoted  in  Problem  19.  Let  X  —  x2  —  1  and  Kn  —  2" n ! .  so  that  Rodrigues’s 
formula  says  that  KnPn  =  Dn(X"). 

(a)  Expand  Dn+l  [{D(Xn))X]  by  the  Leibniz  rule. 

(b)  Observe  that  ( D(X"))X  =  nX" X1 .  and  expand  D"+ 1  ( (nX")X']  by  the 
Leibniz  rule. 

(c)  Equating  the  results  of  the  previous  two  parts,  conclude  that  y  —  P„(x) 
satishes  the  differential  equation  (1  —  x2)y"  —  2xy’  +  n(n  +  l)y  =  0. 

23.  Let  Pn  (x)  —  Yl={)  ckXk ■  Using  the  differential  equation,  show  that  the  coeffi¬ 
cients  Ck  satisfy  k(k  —  l)c£  =  [(k  —  2){k  —  1)  —  n(n  +  1)]ca-_2  for  k  >2  and 
that  Ck  —  0  unless  n  —  k  is  even. 

Problems  24—28  concern  the  complex  conjugate  of  an  inner-product  space  over  C. 
For  any  finite-dimensional  inner-product  space  V ,  the  Riesz  Representation  Theorem 
identifies  the  dual  V'  with  V,  saying  that  each  member  of  V'  is  given  by  taking  the 
inner  product  with  some  member  of  V .  When  the  scalars  are  real,  this  identification 
is  linear;  thus  the  Riesz  theorem  uses  the  inner  product  to  construct  a  canonical 
isomorphism  of  V  onto  V’ .  When  the  scalars  are  complex,  the  identification  is 
conjugate  linear,  and  we  do  not  get  an  isomorphism  of  V  with  V’ .  The  complex 
conjugate  of  V  provides  a  substitute  result. 

24.  Let  V  be  a  finite-dimensional  vector  space  over  C.  Define  a  new  complex  vector 
space  V  as  follows:  The  elements  of  V  are  the  elements  of  V,  and  the  definition 
of  addition  is  unchanged.  However,  there  is  a  change  in  the  definition  of  scalar 
multiplication,  in  that  if  v  is  in  V,  then  the  product  cv  in  V  is  to  equal  the  product 
cu  in  V.  Verify  that  V  is  indeed  a  complex  vector  space. 

25.  If  V  is  a  complex  vector  space  and  L  :  V  — »•  V  is  a  linear  map.  define  L  :  V  -*  V 
to  be  the  same  function  as  L.  Prove  that  L  is  linear. 

26.  Suppose  that  the  complex  vector  space  V  is  actually  a  finite-dimensional  inner- 
product  space,  with  inner  product  (■ ,  •  )y.  Define  (, u ,  v)v  =  ( v ,  u)y.  Verify 
that  V  is  an  inner-product  space. 

27.  With  V  as  in  the  previous  problem,  show  that  the  Riesz  Representation  Theorem 
uses  the  inner  product  to  set  up  a  canonical  isomorphism  of  V ’  with  V. 

28.  With  V  and  V  as  in  the  two  previous  problems,  let  L  :  V  —>  V  be  linear,  so 
that  (L)*  :  V  — >•  V  is  linear.  Under  the  identification  of  the  previous  problem 
of  V  with  Vf,  show  that  (L)*  corresponds  to  the  contragredient  V  as  defined  in 
Section  II.4. 
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Problems  29-32  use  inner-product  spaces  to  obtain  a  decomposition  of  polynomials 
in  several  variables.  A  real- valued  polynomial  function  p  in  x\, . . . ,  x„  is  said  to  be 
homogeneous  of  degree  N  if  every  monomial  in  p  has  total  degree  N.  Let  Vn  be 
the  space  of  real-valued  polynomials  in  x\ ,  . . . ,  x„  homogeneous  of  degree  N.  For 
any  homogeneous  polynomial  p,  we  define  a  differential  operator  3 (p)  with  constant 
coefficients  by  requiring  that  3  ( • )  be  linear  in  ( • )  and  that 


3(*i  ■■■■<")  = 


0&i  H - hkn 

3V[*  •  •  •  3 x„" 


For  example,  if  \x\2  stands  for  x2  +  ■  ■  ■  +  x2,  then  3(|jc|2)  =  A  =  +  ■  ■  ■  + 

If  p  and  q  are  in  the  same  Vn,  then  3 (q)p  is  a  constant  polynomial,  and  we  define 
(p,  q)  to  be  that  constant.  Then  ( • ,  • }  is  bilinear. 

29.  (a)  Prove  that  ( • ,  •  >  satisfies  ( p ,  q )  =  (, q ,  p). 

(b)  Prove  that  (x\l  ■  ■  ■  x„" ,  x['  ■  ■  ■  x1,")  is  positive  if  (k\ ,  . . . ,  kn)  =  (l\ , . . . ,  In ) 
and  is  0  otherwise. 

(c)  Deduce  that  ( • ,  • )  is  an  inner  product  on  Vn  . 

30.  Call  p  e  Vn  harmonic  if  3(|.r|2)/?  =  0,  and  let  //;y  be  the  vector  subspace  of 
harmonic  polynomials.  Prove  that  the  orthogonal  complement  of  \x\2Vn-2  in 
Vn  relative  to  ( • ,  • }  is  /Ly . 

31.  Deduce  from  Problem  30  that  each  p  e  Vn  decomposes  uniquely  as 


p  =  hN  +  \x\2hN-2  +  \x\4hN- 4  H - 


with  /in,  h n—2,  hN- 4,  ■  ■  ■  homogeneous  harmonic  of  the  indicated  degrees. 

32.  For  n  —  2,  describe  a  computational  procedure  for  decomposing  the  element 
xf  +  x4  of  V4  as  in  Problem  3 1 . 

Problems  33-34  concern  products  of  n-by-n  positive  semidehnite  matrices.  They 

make  use  of  Problem  26  in  Chapter  II,  which  says  that  detf/,/  —  CD)  —  det  (XI  —  DC). 

33.  Let  A  and  B  be  positive  semidehnite.  Using  the  positive  definite  square  root  of 
B,  prove  that  every  eigenvalue  of  A  B  is  >  0. 

34.  Let  A,  B,  and  C  be  positive  semidehnite,  and  suppose  that  ABC  is  Hermit- 
ian.  Under  the  assumption  that  C  is  invertible,  introduce  the  positive  dehnite 
square  root  P  of  C.  By  considering  P-1  ABC P-1 .  prove  that  ABC  is  positive 
semidehnite. 


CHAPTER  IV 


Groups  and  Group  Actions 


Abstract.  This  chapter  develops  the  basics  of  group  theory,  with  particular  attention  to  the  role  of 
group  actions  of  various  kinds.  The  emphasis  is  on  groups  in  Sections  1-3  and  on  group  actions 
starting  in  Section  6.  In  between  is  a  two-section  digression  that  introduces  rings,  fields,  vector 
spaces  over  general  fields,  and  polynomial  rings  over  commutative  rings  with  identity. 

Section  1  introduces  groups  and  a  number  of  examples,  and  it  establishes  some  easy  results. 
Most  of  the  examples  arise  either  from  number-theoretic  settings  or  from  geometric  situations  in 
which  some  auxiliary  space  plays  a  role.  The  direct  product  of  two  groups  is  discussed  briefly  so 
that  it  can  be  used  in  a  table  of  some  groups  of  low  order. 

Section  2  defines  coset  spaces,  normal  subgroups,  homomorphisms,  quotient  groups,  and  quotient 
mappings.  Lagrange’s  Theorem  is  a  simple  but  key  result.  Another  simple  but  key  result  is  the 
construction  of  a  homomorphism  with  domain  a  quotient  group  G/H  when  a  given  homomorphism 
is  trivial  on  H .  The  section  concludes  with  two  standard  isomorphism  theorems. 

Section  3  introduces  general  direct  products  of  groups  and  direct  sums  of  abelian  groups,  together 
with  their  concrete  “external”  versions  and  their  universal  mapping  properties. 

Sections  4-5  are  a  digression  to  define  rings,  fields,  and  ring  homomorphisms,  and  to  extend  the 
theories  concerning  polynomials  and  vector  spaces  as  presented  in  Chapters  I— II.  The  immediate 
purpose  of  the  digression  is  to  make  prime  fields  and  the  notion  of  characteristic  available  for  the 
remainder  of  the  chapter.  The  definitions  of  polynomials  are  extended  to  allow  coefficients  from  any 
commutative  ring  with  identity  and  to  allow  more  than  one  indeterminate,  and  universal  mapping 
properties  for  polynomial  rings  are  proved. 

Sections  6-7  introduce  group  actions.  Section  6  gives  some  geometric  examples  beyond  those 
in  Section  1,  it  establishes  a  counting  formula  concerning  orbits  and  isotropy  subgroups,  and  it 
develops  some  structure  theory  of  groups  by  examining  specific  group  actions  on  the  group  and  its 
coset  spaces.  Section  7  uses  a  group  action  by  automorphisms  to  define  the  semidirect  product  of 
two  groups.  This  construction,  in  combination  with  results  from  Sections  5-6,  allows  one  to  form 
several  new  finite  groups  of  interest. 

Section  8  defines  simple  groups,  proves  that  alternating  groups  on  five  or  more  letters  are  simple, 
and  then  establishes  the  Jordan-Holder  Theorem  concerning  the  consecutive  quotients  that  arise 
from  composition  series. 

Section  9  deals  with  finitely  generated  abelian  groups.  It  is  proved  that  "rank”  is  well  defined 
for  any  finitely  generated  free  abelian  group,  that  a  subgroup  of  a  free  abelian  group  of  finite  rank  is 
always  free  abelian,  and  that  any  finitely  generated  abelian  group  is  the  direct  sum  of  cyclic  groups. 

Section  10  returns  to  structure  theory  for  finite  groups.  It  begins  with  the  Sylow  Theorems, 
which  produce  subgroups  of  prime-power  order,  and  it  gives  two  sample  applications.  One  of  these 
classifies  the  groups  of  order  pq,  where  p  and  q  are  distinct  primes,  and  the  other  provides  the 
information  necessary  to  classify  the  groups  of  order  12. 

Section  1 1  introduces  the  language  of  “categories”  and  “functors.”  The  notion  of  category  is  a 
precise  version  of  what  is  sometimes  called  a  “context”  at  points  in  the  book  before  this  section, 
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and  some  of  the  “constructions”  in  the  book  are  examples  of  "functors."  The  section  treats  in  this 
language  the  notions  of  "product”  and  “coproduct,”  which  are  abstractions  of  “direct  product”  and 
"direct  sum.” 


1.  Groups  and  Subgroups 

Linear  algebra  and  group  theory  are  two  foundational  subjects  for  all  of  algebra, 
indeed  for  much  of  mathematics.  Chapters  II  and  III  have  introduced  the  basics 
of  linear  algebra,  and  the  present  chapter  introduces  the  basics  of  group  theory.  In 
this  section  we  give  the  definition  and  notation  for  groups  and  provide  examples 
that  fit  with  the  historical  development  of  the  notion  of  group.  Many  readers  will 
already  be  familiar  with  some  group  theory,  and  therefore  we  can  be  brief  at  the 
start. 

A  group  is  a  nonempty  set  G  with  an  operation  G  x  G  — »■  G  satisfying  the 
three  properties  (i),  (ii),  and  (iii)  below.  In  the  absence  of  any  other  information 
the  operation  is  usually  called  multiplication  and  is  written  (a,  b)  ab  with  no 
symbol  to  indicate  the  multiplication.  The  defining  properties  of  a  group  are 

(i)  ( ab)c  =  a  {be)  for  all  a.  b.  c  in  G  (associative  law), 

(ii)  there  exists  an  element  1  in  G  such  that  a  1  =  la  =  a  for  all  a  in  G 
(existence  of  identity), 

(iii)  for  each  a  in  G,  there  exists  an  element  a~x  in  G  with  aa~l  =  a~1a  =  l 
(existence  of  inverses). 

It  is  immediate  from  these  properties  that 

•  1  is  unique  (since  1'  =  l'l  =  1), 

•  a-1  is  unique  (since  (a~1)'  =  (a~l)'l  =  (a~l)'(a(a~l))  =  ((a~l)'a)(a~l) 
=  Ka-1)  =  (o-1)), 

•  the  existence  of  a  left  inverse  for  each  element  implies  the  existence  of  a 
right  inverse  for  each  element  (since  ba  =  1  and  cb  =  1  together  imply 
c  =  c{ba)  =  ( cb)a  =  a  and  hence  also  ab  =  cb  =  1), 

•  1  is  its  own  inverse  (since  11  =  1), 

•  ax  =  ay  implies  x  =  y,  and  xa  =  ya  implies  x  =  y  (cancellation  laws) 
(since  x  =  lx  =  (a~1a)x  =  a-1  (ax)  =  a~l(ay)  =  (a~1a)y  =  1  y  =  y 
and  since  a  similar  argument  proves  the  second  implication). 

Problem  2  at  the  end  of  Chapter  II  shows  that  the  associative  law  extends  to 
products  of  any  finite  number  of  elements  of  G  as  follows:  parentheses  can 
be  inserted  in  any  fashion  in  such  a  product,  and  the  value  of  the  product  is 
unchanged;  hence  any  expression  a\a2  ■  •  •  an  in  G  is  well  defined  without  the  use 
of  parentheses. 

The  group  whose  only  element  is  the  identity  1  will  be  denoted  by  {1}.  It  is 
called  the  trivial  group. 
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We  come  to  other  examples  in  a  moment.  First  we  make  three  more  definitions 
and  offer  some  comments.  A  subgroup  H  of  a  group  G  is  a  subset  containing 
the  identity  that  is  closed  under  multiplication  and  inverses.  Then  H  itself  is  a 
group  because  the  associativity  in  G  implies  associativity  in  H.  The  intersection 
of  any  nonempty  collection  of  subgroups  of  G  is  again  a  subgroup. 

An  isomorphism  of  a  group  G\  with  a  group  GT  is  a  function  <p  :  G\  — >■  GT 
that  is  one-one  onto  and  satisfies  q>{cib)  =  <p(a)<p(b)  for  all  a  and  h  i n  G i .  It  is 
immediate  that 

•  <p(l)  =  1  (by  taking  a  =  b  =  1), 

•  c p(a~l )  =  (p{a)~l  (by  taking  b  =  a~l), 

•  cp~l  :  G2  -»■  Gi  satisfies  cp~l(cd )  =  <p_1(c)^_1(cO  (by  taking  c  =  <p(a) 
and  d  =  <p(b)  on  the  right  side  and  then  observing  that  <p(<p-1  (c)<p-1  (d)) 
=  (p(ab)  =  (p{a)(p{b)  =  cd  =  (p((p~l{cd))). 

The  first  and  second  of  these  properties  show  that  an  isomorphism  respects  all  the 
structure  of  a  group,  not  just  products.  The  third  property  shows  that  the  inverse 
of  an  isomorphism  is  an  isomorphism,  hence  that  the  relation  “is  isomorphic  to”  is 
symmetric.  Since  the  identity  isomorphism  exhibits  this  relation  as  reflexive  and 
since  the  use  of  compositions  shows  that  it  is  transitive,  we  see  that  “is  isomorphic 
to”  is  an  equivalence  relation.  Common  notation  for  an  isomorphism  between 
G 1  and  Gi  is  G\  =  G2;  because  of  the  symmetry,  one  can  say  that  G 1  and  Gi 
are  isomorphic. 

An  abelian  group  is  a  group  G  with  the  additional  property 

(iv)  ab  =  ba  for  all  a  and  b  in  G  (commutative  law). 

In  an  abelian  group  the  operation  is  sometimes,  but  by  no  means  always,  called 
addition  instead  of  "multiplication."  Addition  is  typically  written  (a,  b )  i->  a+b, 
and  then  the  identity  is  usually  denoted  by  0  and  the  inverse  of  a  is  denoted  by  —a, 
the  negative  of  a.  Depending  on  circumstances,  the  trivial  abelian  group  may 
be  denoted  by  {0}  or  0.  Problem  3  at  the  end  of  Chapter  II  shows  for  an  abelian 
group  G  with  its  operation  written  additively  that  n-fold  sums  of  elements  of  G 
can  be  written  in  any  order:  a\  +  aj  +  •  •  •  +  an  =  aa( p  +  aa(2)  +  •  •  •  +  aa(n)  for 
each  permutation  o  of  {1, 

Historically  the  original  examples  of  groups  arose  from  two  distinct  sources, 
and  it  took  a  while  for  the  above  definition  of  group  to  be  distilled  out  as  the 
essence  of  the  matter. 

One  of  the  two  sources  involved  number  systems  and  vectors.  Here  are 
examples. 

Examples. 

(1)  Additive  groups  of  familiar  number  systems.  The  systems  in  question  are 
the  integers  Z,  the  rational  numbers  Q,  the  real  numbers  R,  and  the  complex 


120 


IV.  Croups  and  Croup  Actions 


numbers  C.  In  each  case  the  set  with  its  usual  operation  of  addition  forms  an 
abelian  group.  The  group  properties  of  Z  under  addition  are  taken  as  known  in 
advance  in  this  book,  as  mentioned  in  Section  A3  of  the  appendix,  and  the  group 
properties  of  Q,  M,  and  C  under  addition  are  sketched  in  Sections  A3  and  A4  of 
the  appendix  as  part  of  the  development  of  these  number  systems. 

(2)  Multiplicative  groups  connected  with  familiar  number  systems.  In  the 
cases  of  Q,  M,  and  C,  the  nonzero  elements  form  a  group  under  multiplication. 
These  groups  are  denoted  by  Qx ,  Rx ,  and  Cx .  Again  the  properties  of  a  group 
for  each  of  them  are  properties  that  are  sketched  during  the  development  of  each 
of  these  number  systems  in  Sections  A3  and  A4  of  the  appendix.  With  Z,  the 
nonzero  integers  do  not  form  a  group  under  multiplication,  because  only  the  two 
units,  i.e.,  the  divisors  +1  and  —1  of  1,  have  inverses.  The  units  do  form  a  group, 
however,  under  multiplication,  and  the  group  of  units  is  denoted  by  Zx . 

(3)  Vector  spaces  under  addition.  Spaces  such  as  Q"  and  R"  and  C"  provide 
us  with  further  examples  of  abelian  groups.  In  fact,  the  defining  properties  of 
addition  in  a  vector  space  are  exactly  the  defining  properties  of  an  abelian  group. 
Thus  every  vector  space  provides  us  with  an  example  of  an  abelian  group  if  we 
simply  ignore  the  scalar  multiplication. 

(4)  Integers  modulo  m,  under  addition.  Another  example  related  to  number 
systems  is  the  additive  group  of  integers  modulo  a  positive  integer  m.  Let  us  say 
that  an  integer  n \  is  congruent  modulo  m  to  an  integer  «2  if  m  divides  n  \  —  ni. 
One  writes  n\  =  112  or  n  \  =  n 2  mod  m  or  n  \  =  112  mod  m  for  this  relation.1  It 
is  an  equivalence  relation,  and  we  can  write  [n]  for  the  equivalence  class  of  n 
when  it  is  helpful  to  do  so.  The  division  algorithm  (Proposition  1.1)  tells  us  that 
each  equivalence  class  has  one  and  only  one  member  between  0  and  m  —  1 .  Thus 
there  are  exactly  m  equivalence  classes,  and  we  know  a  representative  of  each. 
The  set  of  classes  will  be  denoted  by2  Z/mZ.  The  point  is  that  Z / m Z  inherits 
an  abelian-group  structure  from  the  abelian-group  structure  of  Z.  Namely,  we 
attempt  to  define 

[u]  +  [£>]  =  [a  +  b\. 

To  see  that  this  formula  actually  defines  an  operation  on  Z/mZ,  we  need  to 
check  that  the  result  is  meaningful  if  the  representatives  of  the  classes  [o]  and 
[b]  are  changed.  Thus  let  [a]  =  [o']  and  \b\  =  \ b' \ .  Then  m  divides  a  —  a'  and 
b  —  b' ,  and  m  must  divide  the  sum  (a  —  a')  +  (b  —  b')  =  (a  +  b)  —  ( a '  +  b')\ 
consequently  [a  +  b]  =  [ a '  +  b'\,  and  addition  is  well  defined.  The  same  kind  of 

1  This  notation  was  anticipated  in  a  remark  explaining  the  classical  form  of  the  Chinese  Remainder 
Theorem  (Corollary  1.9). 

2The  notation  Z/(/?t)  is  an  allowable  alternative.  Some  authors,  particularly  in  topology,  write 
Zm  for  this  set,  but  the  notation  Zm  can  cause  confusion  since  Zp  is  the  standard  notation  for  the 
"p-adic  integers”  when  p  is  prime.  These  are  defined  in  Chapter  VI  of  Advanced  Algebra. 
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argument  shows  that  the  associativity  and  commutativity  of  addition  in  Z  imply 
associativity  and  commutativity  in  Z/mZ.  The  identity  element  is  [0],  and  group 
inverses  (negatives)  are  given  by  — [ a ]  =  [—a].  Therefore  Z/mZ  is  an  abelian 
group  under  addition,  and  it  has  m  elements.  If  x  and  y  are  members  of  Z/mZ, 
their  sum  is  often  denoted  by  x  +  y  mod  m . 

The  other  source  of  early  examples  of  groups  historically  has  the  members  of 
the  group  operating  as  transformations  of  some  auxiliary  space.  Before  abstract¬ 
ing  matters,  let  us  consider  some  concrete  examples,  ignoring  some  of  the  details 
of  verifying  the  defining  properties  of  a  group. 

Examples,  continued. 

(5)  Permutations.  A  permutation  of  a  nonempty  finite  set  E  of  n  elements  is  a 
one-one  function  from  E  onto  itself.  Permutations  were  introduced  in  Section  1.4. 
The  product  of  two  permutations  is  just  the  composition,  defined  by  (cr  r)(x)  = 
cr(r(x))  for  x  in  E,  with  the  symbol  o  for  composition  dropped.  The  resulting 
operation  makes  the  set  of  permutations  of  E  into  a  group:  we  already  observed 
in  Section  1.4  that  composition  is  associative,  and  it  is  plain  that  the  identity 
permutation  may  be  taken  as  the  group  identity  and  that  the  inverse  function  to 
a  permutation  is  the  group  inverse.  The  group  is  called  the  symmetric  group 
on  the  n  letters  of  E.  It  has  n !  members  for  n  >  1.  The  notation  S„  is  often 
used  for  this  group,  especially  when  E  =  ( 1 , ..../;}.  Signs  ±1  were  defined 
for  permutations  in  Section  1.4,  and  we  say  that  a  permutation  is  even  or  odd 
according  as  its  sign  is  +1  or  —1.  The  sign  of  a  product  is  the  product  of  the 
signs,  according  to  Proposition  1.24,  and  it  follows  that  the  even  permutations 
form  a  subgroup  of  This  subgroup  is  called  the  alternating  group  on  n 
letters  and  is  denoted  by  2t„.  It  has  \(n\)  members  if  n  >  2. 

(6)  Symmetries  of  a  regular  polygon.  Imagine  a  regular  polygon  in  R2  centered 
at  the  origin.  The  plane-geometry  rotations  and  reflections  about  the  origin  that 
carry  the  polygon  to  itself  form  a  group.  If  the  number  of  sides  of  the  polygon 
is  n,  then  the  group  always  contains  the  rotations  through  all  multiples  of  the 
angle  2n /n.  The  rotations  themselves  form  an  n -element  subgroup  of  the  group 
of  all  symmetries.  To  consider  what  reflections  give  symmetries,  we  distinguish 
the  cases  n  odd  and  n  even.  When  n  is  odd,  the  reflection  in  the  line  that  passes 
through  any  vertex  and  bisects  the  opposite  side  carries  the  polygon  to  itself,  and 
no  other  reflections  have  this  property.  Thus  the  group  of  symmetries  contains  n 
reflections.  When  n  is  even,  the  reflection  in  the  line  passing  through  any  vertex 
and  the  opposite  vertex  carries  the  polygon  to  itself,  and  so  does  the  reflection  in 
the  line  that  bisects  a  side  and  also  the  opposite  side.  There  are  n /2  reflections  of 
each  kind,  and  hence  the  group  of  symmetries  again  contains  n  reflections.  The 
group  of  symmetries  thus  has  2 n  elements  in  all  cases.  It  is  called  the  dihedral 
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group  Dn .  The  group  D„  is  isomorphic  to  a  certain  subgroup  of  the  permutation 
group  Namely,  we  number  the  vertices  of  the  polygon,  and  we  associate  to 
each  member  of  Dn  the  permutation  that  moves  the  vertices  the  way  the  member 
of  Dn  does. 

(7)  General  linear  group.  With  F  equal  to  Q  or  M  or  C,  consider  any  n- 
dimensional  vector  space  V  over  F.  One  possibility  is  V  =  F",  but  we  do  not 
insist  on  this  choice.  Among  all  one-one  functions  carrying  V  onto  itself,  let 
G  consist  of  the  linear  ones.  The  composition  of  two  linear  maps  is  linear,  and 
the  inverse  of  an  invertible  function  is  linear  if  the  given  function  is  linear.  The 
result  is  a  group  known  as  the  general  linear  group  GL(V).  When  V  = 

we  know  from  Chapter  II  that  we  can  identify  linear  maps  from  F”  to  itself  with 
matrices  in  Mnn  (F)  and  that  composition  corresponds  to  matrix  multiplication. 
It  follows  that  the  set  of  all  invertible  matrices  in  Mnn(F )  is  a  group,  which  is 
denoted  by  GL(n,  F),  and  that  this  group  is  isomorphic  to  GL(F'?).  The  set  SL(  V) 
or  SL(n  ,  F)  of  all  members  of  GL(  V )  or  GL (n,  F)  of  determinant  1  is  a  group 
since  the  determinant  of  a  product  is  the  product  of  the  determinants;  it  is  called 
the  special  linear  group.  The  dihedral  group  Dn  is  isomorphic  to  a  subgroup  of 
GL(2,  M)  since  each  rotation  and  reflection  of  R2  that  fixes  the  origin  is  given  by 
the  operation  of  a  2-by-2  matrix. 

(8)  Orthogonal  and  unitary  groups.  If  V  is  a  finite-dimensional  inner-product 
space  over  fi  or  C,  Chapter  III  referred  to  the  linear  maps  carrying  the  space 
to  itself  and  preserving  lengths  of  vectors  as  orthogonal  in  the  real  case  and 
unitary  in  the  complex  case.  Such  linear  maps  are  invertible.  The  condition  of 
preserving  lengths  of  vectors  is  maintained  under  composition  and  inverses,  and 
it  follows  that  the  orthogonal  or  unitary  linear  maps  form  a  subgroup  O(V)  or 
U(V)  of  the  general  linear  group  GL(  V).  One  writes  O(n)  for  0(11”)  and  U(«) 
for  U(C").  The  subgroup  of  members  of  0(  V)  or  0(n)  of  determinant  1  is  called 
the  rotation  group  S0(  V)  or  SO («).  The  subgroup  of  members  of  U(  V  )  or  U(n) 
of  determinant  1  is  called  the  special  unitary  group  SU(  V  )  or  SU(n). 

Before  coming  to  Example  9,  let  us  establish  a  closure  property  under  the 
arithmetic  operations  for  certain  subsets  of  C.  We  are  going  to  use  the  theories  of 
polynomials  as  in  Chapter  I  and  of  vector  spaces  as  in  Chapter  II  with  the  rationals 
Q  as  the  scalars.  Fix  a  complex  number  9 ,  and  form  the  result  of  evaluating  at  9 
every  polynomial  in  one  indeterminate  with  coefficients  in  Q.  The  resulting  set 
of  complex  numbers  comes  by  substituting  9  for  X  in  the  members  of  Q[X],  and 
we  denote  this  subset  of  C  by  Q[0]. 

Suppose  that  9  has  the  property  that  the  set  {1,  9,  92 . 9"}  is  linearly  de¬ 

pendent  over  Q  for  some  integer  n  >  1,  i.e.,  has  the  property  that  Fq(9)  =0  for 
some  nonzero  member  Fq  of  Q[  X  ]  of  degree  <  n.  For  example,  if  9  =  s/2,  then 
the  set  {1.  s/2,  (\/2)2}  is  linearly  dependent  since  2-(v/2)2=0;  if  9  =  e2ni'\ 
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then  {1,0,  02,  03,  04,  05}  is  linearly  dependent  since  1  —  05  =  0,  or  alternatively 
since  1  +  0  +  92  +  03  +  04  =  0. 

Returning  to  the  general  0,  we  lose  no  generality  if  we  assume  that  the  polyno¬ 
mial  Fo  has  degree  exactly  n.  If  we  divide  the  equation  F$(0)  =  0  by  the  leading 
coefficient,  we  obtain  an  equality  0"  =  Go(0),  where  Go  is  the  zero  polynomial 
or  is  a  nonzero  polynomial  of  degree  at  most  n  —  1 .  Then  Qn+m  =  Qm  Go(0),  and 
we  see  inductively  that  every  power  9r  with  r  >  n  is  a  linear  combination  of  the 
members  of  the  set  {1,  9,  02, . . . ,  0"-1}.  This  set  is  therefore  a  spanning  set  for 
the  vector  space  Q[0],  and  we  find  that<Q>[0]  is  finite-dimensional,  with  dimension 
at  most  n.  Since  every  positive  integer  power  of  9  lies  in  Q[0  |  and  since  these 
powers  are  closed  under  multiplication,  the  vector  space  <Q>[0]  is  closed  under 
multiplication.  More  striking  is  that  Q[0]  is  closed  under  division,  as  is  asserted 
in  the  following  proposition. 

Proposition  4.1.  Let  9  be  in  C,  and  suppose  for  some  integer  n  >  1  that  the  set 
{1,  0,  02, . . . ,  0"}  is  linearly  dependent  over  Q.  Then  the  finite-dimensional  ra¬ 
tional  vector  space  Q[0]  is  closed  under  taking  reciprocals  (of  nonzero  elements), 
as  well  as  multiplication,  and  hence  is  closed  under  division. 

Remarks.  Under  the  hypotheses  of  Proposition  4.1,  Q[0]  is  called  an 
algebraic  number  field,3  or  simply  a  number  field,  and  0  is  called  an  algebraic 
number.  The  relevant  properties  of  C  that  are  used  in  proving  the  proposition 
are  that  C  is  closed  under  the  usual  arithmetic  operations,  that  these  satisfy  the 
usual  properties,  and  that  Q  is  a  subset  of  C.  The  deeper  closure  properties  of  C 
that  are  developed  in  Sections  A3  and  A4  of  the  appendix  play  no  role. 

PROOF.  We  have  seen  that  Q[0]  is  closed  under  multiplication.  If  x  is  a  nonzero 
member  of  Q[0],  then  all  positive  powers  of  x  must  be  in  Q[0],  and  the  fact  that 
dim  Q[0 1  <  n  forces  {1,  x,  x2, . . . ,  x"}  to  be  linearly  dependent.  Therefore  there 
are  integers  j  and/:  withO  <  j  <  k  <  n  such  that  Cjx9  +Cj+  \  xJ+ 1  +  ■  •  -+CkXk  =  0 
for  some  rational  numbers  cj, ... ,  q  with  q  7^  0.  Since  x  is  assumed  nonzero, 
we  can  discard  unnecessary  terms  and  arrange  that  cj  7^  0.  Then 

1  =  x(-cj'cJ+l  -  cjlcj+2x  -  cJlckxk~J~l), 

and  the  reciprocal  of  x  has  been  exhibited  as  in  Q[0],  □ 

Examples,  continued. 

(9)  Galois’s  notion  of  automorphisms  of  number  fields.  Let  0  be  a  complex 
number  as  in  Proposition  4. 1 .  The  subject  of  Galois  theory,  whose  details  will 

3The  definition  of  “algebraic  number  field"  that  is  given  later  in  the  book  is  ostensibly  more 
general,  but  the  Theorem  of  the  Primitive  Element  in  Chapter  IX  will  show  that  it  amounts  to  the 
same  thing  as  this. 
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be  discussed  in  Chapter  IX  and  whose  full  utility  will  be  glimpsed  only  later, 
works  in  an  important  special  case  with  the  “automorphisms”  of  Q[0]  that  fix  Q. 
The  automorphisms  are  the  one-one  functions  from  Q[0]  onto  itself  that  respect 
addition  and  multiplication  and  carry  every  element  of  Q  to  itself.  The  identity 
is  such  a  function,  the  composition  of  two  such  functions  is  again  one,  and  the 
inverse  of  such  a  function  is  again  one.  Therefore  the  automorphisms  of  Q[0] 
form  a  group  under  composition.  We  call  this  group  Gal((Q>[0]/Q).  Let  us  see 
that  it  is  finite.  In  fact,  if  a  is  in  Gal(Q[0]/Q),  then  a  is  determined  by  its  effect 
on  9,  since  we  must  have  a(F{9 ))  =  F(a(9))  for  every  F  in  Q[X],  We  know 
that  there  is  some  nonzero  polynomial  Fq(X)  such  that  Fq(9)  =  0.  Applying  a 
to  this  equality,  we  see  that  Fq{g{9))  =  0.  Therefore  o{9)  has  to  be  a  root  of 
Fq.  Viewing  F\t  as  in  C[X],  we  can  apply  Corollary  1 . 14  and  see  that  Fq  has  only 
finitely  many  complex  roots.  Therefore  there  are  only  finitely  many  possibilities 
for  a,  and  the  group  Gal(Q[0]/Q)  has  to  be  finite.  Galois  theory  shows  that 
this  group  gives  considerable  insight  into  the  structure  of  Q[0].  For  example  it 
allows  one  to  derive  the  Fundamental  Theorem  of  Algebra  (Theorem  1.18)  just 
from  algebra  and  the  Intermediate  Value  Theorem  (Section  A3  of  the  appendix); 
it  allows  one  to  show  the  impossibility  of  certain  constructions  in  plane  geometry 
by  straightedge  and  compass;  and  it  allows  one  to  show  that  a  quintic  polynomial 
with  rational  coefficients  need  not  have  a  root  that  is  expressible  in  terms  of 
rational  numbers,  arithmetic  operations,  and  the  extraction  of  square  roots,  cube 
roots,  and  so  on.  We  return  to  these  matters  in  Chapter  IX. 

Examples  5-9,  which  all  involve  auxiliary  spaces,  fit  the  pattern  that  the 
members  of  the  group  are  invertible  transformations  of  the  auxiliary  space  and  the 
group  operation  is  composition.  This  notion  will  be  abstracted  in  Section  6  and 
will  lead  to  the  notion  of  a  “group  action.”  For  now,  let  us  see  why  we  obtained 
groups  in  each  case.  If  X  is  any  nonempty  set,  then  the  set  of  invertible  functions 
/  :  X  — >  X  forms  a  group  under  composition,  composition  being  defined  by 
( fg)(x )  =  f(g(x))  with  the  usual  symbol  o  dropped.  The  associative  law  is  just 
a  matter  of  unwinding  this  definition: 

i(fg)h)(x)  =  ( fg)(h(x ))  =  f  (g(h(x)))  =  f((gh)(x))  =  ( f(gh))(x ). 

The  identity  function  is  the  identity  of  the  group,  and  inverse  functions  provide 
the  inverse  elements  in  the  group. 

For  our  examples,  the  set  X  was  E  in  Example  5,  R2  in  Example  6,  V  or  F'7 
in  Example  7,  V  or  Q"  or  R"  or  C"  in  Example  8,  and  Q[0]  in  Example  9.  All 
that  was  needed  in  each  case  was  to  know  that  our  set  G  of  invertible  functions 
from  X  to  itself  formed  a  subgroup  of  the  set  of  all  invertible  functions  from  X 
to  itself.  In  other  words,  we  had  only  to  check  that  G  contained  the  identity  and 
was  closed  under  composition  and  inversion.  Associativity  was  automatic  for  G 
because  it  was  valid  for  the  group  of  all  invertible  functions  from  X  to  itself. 
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Actually,  any  group  can  be  realized  in  the  fashion  of  Examples  5-9.  This  is 
the  content  of  the  next  proposition. 

Proposition  4.2  (Cayley’s  Theorem).  Any  group  G  is  isomorphic  to  a  sub¬ 
group  of  invertible  functions  on  a  set  X.  The  set  X  can  be  taken  to  be  G  itself. 
In  particular  any  finite  group  with  n  elements  is  isomorphic  to  a  subgroup  of  the 
symmetric  group 

Proof.  Define  X  =  G,  put  fa(x)  =  ax  for  a  in  G,  and  let  G'  =  {fa  \  a  e  G}. 
To  see  that  G'  is  a  group,  we  need  G'  to  contain  the  identity  and  to  be  closed 
under  composition  and  inverses.  Since  f\  is  the  identity,  the  identity  is  indeed 
in  G'.  Since  fab(x)  =  ( ab)x  =  a(bx)  =  fa(bx )  =  fa(fb(x ))  =  ( fafb)(x ), 
G'  is  closed  under  composition.  The  formula  fafa  =  f\  =  fa  1  fa  then  shows 
that  fa- 1  =  (  fa)-1  and  that  G'  is  closed  under  inverses.  Thus  G'  is  a  group. 

Define  <p  :  G  — >  G'  by  <p(a)  =  fa.  Certainly  (p  is  onto  G',  and  it  is  one- 
one  because  <p(a)  =  <p(b)  implies  fa  =  fh,  fa(  1)  =  f,(  1),  and  a  =  b.  Also, 
( p(ab )  =  fab  =  fa  fb  =  <p(a)(p(b),  and  hence  <p  is  an  isomorphism. 

In  the  case  that  G  is  finite  with  n  elements,  G  is  exhibited  as  isomorphic 
to  a  subgroup  of  the  group  of  permutations  of  the  members  of  G.  Hence  it  is 
isomorphic  to  a  subgroup  of  &n.  □ 

It  took  the  better  part  of  a  century  for  mathematicians  to  sort  out  that  two 
distinct  notions  are  involved  here— that  of  a  group,  as  defined  above,  and  that 
of  a  group  action,  as  will  be  defined  in  Section  6.  In  sorting  out  these  matters, 
mathematicians  realized  that  it  is  wise  to  study  the  abstract  group  first  and  then 
to  study  the  group  in  the  context  of  its  possible  group  actions.  This  does  not  at  all 
mean  ignoring  group  actions  until  after  the  study  of  groups  is  complete;  indeed, 
we  shall  see  in  Sections  6,  7,  and  10  that  group  actions  provide  useful  tools  for 
the  study  of  abstract  groups. 

We  turn  to  a  discussion  of  two  general  group-theoretic  notions— cyclic  group 
and  the  direct  product  of  two  or  more  groups.  The  second  of  these  notions  will 
be  discussed  only  briefly  now;  more  detail  will  come  in  Section  3. 

If  a  is  an  element  of  a  group,  we  define  a"  for  integers  n  >  0  inductively 
by  a1  =  a  and  an  =  an~la.  Then  we  can  put  a0  =  1  and  a~n  =  (a-1)" 
for  n  >  0.  A  little  checking,  which  we  omit,  shows  that  the  ordinary  rules  of 
exponents  apply:  am+n  =  alna"  and  amn  =  ( am)n  for  all  integers  m  and  n.  If  the 
underlying  group  is  abelian  and  additive  notation  is  being  used,  these  formulas 
read  (m  +  n)a  =  ma  +  na  and  (mn)a  =  n(ma). 

A  cyclic  group  is  a  group  with  an  element  a  such  that  every  element  is  a  power 
of  a.  The  element  a  is  called  a  generator  of  the  group,  and  the  group  is  said  to 
be  generated  by  a. 
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Proposition  4.3.  Each  cyclic  group  G  is  isomorphic  either  to  the  additive 
group  Z  of  integers  or  to  the  additive  group  Z/mZ  of  integers  modulo  m  for  some 
positive  integer  m. 

PROOF.  If  all  a "  are  distinct,  then  the  rule  a"l+"  =  aman  implies  that  the 
function  n  m*-  an  is  an  isomorphism  of  Z  with  G.  On  the  other  hand,  if  ak  =  a1 
with  k  >  1,  then  ak~ 1  =  1  and  there  exists  a  positive  integer  n  such  that  a"  =  1. 
Let  m  be  the  least  positive  integer  with  am  =  1 .  For  any  integers  q  and  r ,  we 
have  aqm+r  =  (am)qar  =  ar .  Thus  the  function  <p  :  Z/mZ  — >  G  given  by 
<p([»])  =  a"  is  well  defined,  is  onto  G,  and  carries  sums  in  Z/mZ  to  products  in 
G.  If  0  <  l  <  k  <  m,  then  ak  ^  a1  since  otherwise  ak~l  would  be  1.  Hence  (p  is 
one-one,  and  we  conclude  that  tp  :  Z/znZ  — »■  G  is  an  isomorphism.  □ 

Let  us  denote  abstract  cyclic  groups  by  C0 0  and  Cm ,  the  subscript  indicating 
the  number  of  elements.  Finite  cyclic  groups  arise  in  guises  other  than  as  Z/mZ. 
For  example  the  set  of  all  elements  e2TTlk,'ln  in  C,  with  multiplication  as  opera¬ 
tion,  forms  a  group  isomorphic  to  Cm.  So  does  the  set  of  all  rotation  matrices 

(cos27r£/m  —  sin27r&/m\ 

.  .  .  ,  ...  with  matrix  multiplication  as  operation. 

sin  27 zk/m  cos2nk/m  l  1  1 

Proposition  4.4.  Any  subgroup  of  a  cyclic  group  is  cyclic. 

Remark.  The  proof  of  Proposition  4.4  exhibits  a  one-one  correspondence 
between  the  subgroups  of  Z/mZ  and  the  positive  integers  k  dividing  in. 

PROOF.  Let  G  be  a  cyclic  group  with  generator  a ,  and  let  H  be  a  subgroup. 
We  may  assume  that  H  /  { 1 } .  Then  there  exists  a  positive  integer  n  such  that 
an  is  in  H,  and  we  let  k  be  the  smallest  such  positive  integer.  If  n  is  any  integer 
such  that  a'1  is  in  H,  then  Proposition  1.2  produces  integers  a  and  y  such  that 
xk  +  yn  =  d,  where  d  =  GCD {k,  n).  The  equation  ad  =  (ak)x (a" )y  exhibits 
ad  as  in  H.  and  the  minimality  of  k  forces  d  >  k.  Since  GCD (7c.  n)  <  k,  we 
conclude  that  d  =  k.  Hence  k  divides  n.  Consequently  H  consists  of  the  powers 
of  ak  and  is  cyclic.  □ 

A  notion  of  the  direct  product  of  two  groups  is  definable  in  the  same  way  as 
was  done  with  vector  spaces  in  Section  II. 6,  except  that  a  little  care  is  needed  in 
saying  how  this  construction  interacts  with  mappings.  As  with  the  corresponding 
construction  for  vector  spaces,  one  can  define  an  explicit  “external”  direct  product, 
and  one  can  recognize  a  given  group  as  an  “internal”  direct  product,  i.e.,  as 
isomorphic  to  an  external  direct  product.  We  postpone  a  fuller  discussion  of  direct 
product,  as  well  as  all  comments  about  direct  sums  and  mappings  associated  with 
direct  sums  and  direct  products,  to  Section  3. 

The  external  direct  product  G\  x  Gi  of  two  groups  G \  and  G2  is  a  group 
whose  underlying  set  is  the  set-theoretic  product  of  G 1  and  Gi  and  whose  group 
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law  is  (gi,  g2)(g[,  g'2)  =  (gig'i,  g2g'2)-  The  identity  is  (1,  1),  and  the  formula  for 
inverses  is  (gi,  g2)_i  =  (g^1,  g21)-  The  two  subgroups  G\  x  {1}  and  (1)  x  Gj 
of  G  i  x  G2  commute  with  each  other. 

A  group  G  is  the  internal  direct  product  of  two  subgroups  G  i  and  G2  if  the 
function  from  the  external  direct  product  G  \  x  G2  to  G  given  by  (gi ,  g2)  g\ go 
is  an  isomorphism  of  groups.  The  literal  analog  of  Proposition  2.30,  which  gave 
three  equivalent  definitions  of  internal  direct  product4  of  vector  spaces,  fails  here. 
It  is  not  sufficient  that  G  \  and  G2  be  two  subgroups  such  that  G  i  D  G2  =  { 1 }  and 
every  element  in  G  decomposes  as  a  product  gig2  with  gi  e  G  \  and  g2  e  G2. 
For  example,  with  G  =  & 3,  the  two  subgroups 

Gi  =  {l,(12)}  and  G2  =  {1,  (1  2  3),  (1  3  2)} 

have  these  properties,  but  G  is  not  isomorphic  to  Gi  x  G 2  because  the  elements 
of  G 1  do  not  commute  with  the  elements  of  G2. 

Proposition  4.5.  If  G  is  a  group  and  G\  and  G2  are  subgroups,  then  the 
following  conditions  are  equivalent: 

(a)  G  is  the  internal  direct  product  of  G\  and  Go, 

(b)  every  element  in  G  decomposes  uniquely  as  a  product  gig2  with  gi  eG  1 
and  g2  e  G2,  and  every  member  of  G 1  commutes  with  every  member  of 

G2,  “ 

(c)  Gi  fl  Go  =  {1},  every  element  in  G  decomposes  as  a  product  gig2  with 
gi  e  G]  and  g2  e  Go,  and  every  member  of  G\  commutes  with  every 
member  of  G2. 

PROOF.  We  have  seen  that  (a)  implies  (b).  If  (b)  holds  and  g  is  in  G\  fl  G2, 
then  the  formula  1  =  gg-1  and  the  uniqueness  of  the  decomposition  of  1  as  a 
product  together  imply  that  g  =  1.  Hence  (c)  holds. 

If  (c)  holds,  define  q>  :  G\  x  G2  ->  G  by  cp{gi,  g2)  =  gig2-  This  map  is 
certainly  onto  G.  To  see  that  it  is  one-one,  suppose  that  <p(gi,  go)  =  <p(g\ ,  g'2). 
Thengig2  =  g[g'2  and  hence  g,~'gi  =  g’2g2l  ■  Since  Gi  D  G2  =  {1},  gi_Igi  = 
g'2g2  '  =  T  Thus  (gi,  g2)  =  (g| ,  gr2),  and  <p  is  one-one.  Finally  the  fact  that 
elements  of  G 1  commute  with  elements  of  G2  implies  that  <p((g  \ ,  g2 ) (gi ,  go))  = 
^(gig'p  g2g9)  =  gig'ig2g2  =  gig2gig2  =  <p(gu  g2)<p(g[,  go)-  Therefore  is  an 
isomorphism,  and  (a)  holds.  □ 

Here  are  two  examples  of  internal  direct  products  of  groups.  In  each  let 
R+  be  the  multiplicative  group  of  positive  real  numbers.  The  first  example  is 

4The  direct  sum  and  direct  product  of  two  vector  spaces  were  defined  to  be  the  same  thing  in 
Chapter  II. 
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Rx  =  Ci  xR+  with  Cs  providing  the  sign.  The  second  example  is  Cx  =  S’xR+, 
where  .S'1  is  the  multiplicative  group  of  complex  numbers  of  absolute  value  1 ;  the 
isomorphism  here  is  given  by  the  polar-coordinate  mapping  (e'e .  r)  e'er. 

We  conclude  this  section  by  giving  an  example  of  a  group  that  falls  outside 
the  pattern  of  the  examples  above  and  by  summarizing  what  groups  we  have 
identified  with  <15  elements. 


Examples,  continued. 

(10)  Groups  associated  with  the  quaternions.  The  set  H  of  quaternions  is  an 
object  like  R  or  C  in  that  it  has  both  an  addition/subtraction  and  a  multiplica¬ 
tion/division,  but  H  is  unlike  R  and  C  in  that  multiplication  is  not  commutative. 
We  give  two  constructions.  In  one  we  start  from  R4  with  the  standard  basis 
vectors  written  as  1 ,  i,  j,  k.  The  multiplication  table  for  these  basis  vectors  is 


11  =  1, 

li  =  i, 

ij  =  j- 

II 

il  =  i, 

ii  =  -1. 

ij  =  k, 

ik  =  -j, 

jl  =  j, 

ji  =  -k, 

jj  =  -1. 

jk  =  i. 

U. 

II 

3 

ki  =  j, 

kj  =  -i, 

kk=  -1, 

and  the  multiplication  is  extended  to  general  elements  by  the  usual  distributive 
laws.  The  multiplicative  identity  is  1,  and  multiplicative  inverses  of  nonzero 
elements  are  given  by 

(fll  +  b\  +  cj  +  k)-1  =  s~la  1  —  5_1/?i  —  s_1cj  —  s“'  r/k 


with  s  =  \J a2  +  b 2  +  c2  +  d2.  Since  ij  =  k  while  ji  =  —  k,  multiplication  is  not 
commutative.  What  takes  work  to  see  is  that  multiplication  is  associative.  To  see 
this,  we  give  another  construction,  using  M22OC).  Within  M 22(C).  take 


and  define  H  to  be  the  linear  span,  with  real  coefficients,  of  these  matrices.  The 
operations  are  the  usual  matrix  addition  and  multiplication.  Then  multiplication 
is  associative,  and  we  readily  verify  the  multiplication  table  for  1 ,  i,  j,  k.  A  little 
computation  verifies  also  the  formula  for  multiplicative  inverses.  The  set  Hx 
of  nonzero  elements  forms  a  group  under  multiplication,  and  it  is  isomorphic  to 
R+  x  SU(2),  where 


SU<2)  ={(_<) 


is  the  2-by-2  special  unitary  group  defined  in  Example  8.  Of  interest  for  our 
current  purposes  is  the  8-element  subgroup  ±1,  ±i,  ±j,  ±k,  which  is  called  the 
quaternion  group  and  will  be  denoted  by  H%. 
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The  order  of  a  finite  group  is  the  number  of  elements  in  the  group.  Let  us  list 
some  of  the  groups  we  have  discussed  that  have  order  at  most  15: 


1 

Ci 

9 

c9, 

C3  X  C3 

2 

c2 

10 

C10, 

d5 

3 

c3 

11 

Cn 

4 

c4, 

C2  X  C2 

12 

C12, 

Gfj  x  C2,  D(,,  2l4 

5 

C5 

13 

C13 

6 

c6, 

Z>3 

14 

C 14, 

O7 

7 

C7 

15 

C 15 

8  C 8,  C 4  x  C 2,  Ci  x  C2  x  C 2,  D4 ,  L/g 

No  two  groups  in  the  above  table  are  isomorphic,  as  one  readily  checks  by  counting 
elements  of  each  “order”  in  the  sense  of  the  next  section.  We  shall  see  in  Section  10 
and  in  the  problems  at  the  end  of  the  chapter  that  the  above  table  is  complete 
through  order  15  except  for  one  group  of  order  12.  Some  groups  that  we  have 
discussed  have  been  omitted  from  the  above  table  because  of  isomorphisms  with 
the  groups  above.  For  example,  62  =  C2,  2I3  =  C3,  C3  x  C2  =  Ce,  63  =  D3, 
C5  x  C2  =  C 10,  C4  x  C3  =  C12,  D3  x  C2  =  D(,,  C7  x  C'2  =  C14,  and 
C5  x  C3  =  C15. 


2.  Quotient  Spaces  and  Homomorphisms 

Let  G  be  a  group,  and  let  H  be  a  subgroup.  For  purposes  of  this  paragraph,  say 
that  gi  in  G  is  equivalent  to  g2  in  G  if  gi  =  g2h  for  some  h  in  H.  The  relation 
“equivalent”  is  an  equivalence  relation:  it  is  reflexive  because  1  is  in  H,  it  is 
symmetric  since  H  is  closed  under  inverses,  and  it  is  transitive  since  H  is  closed 
under  products.  The  equivalence  classes  are  called  left  cosets  of  H  in  G.  The 
left  coset  containing  an  element  g  of  G  is  the  set  gH  =  {gh  \  h  e  H}. 

Examples. 

(1)  When  G  =  Z  and  H  =  mZ,  the  left  cosets  are  the  sets  r  +  mZ,  i.e.,  the 
sets  {x  e  Z  \  x  =  r  mod  m)  for  the  various  values  of  r. 

(2)  When  G  =  ©3  and  H  =  {(1),  (1  3)},  there  are  three  left  cosets:  //, 
(1  2 )H  =  {(1  2),  (1  3  2)},  and  (2  3 )H  =  {(2  3),  (1  2  3)}. 

Similarly  one  can  define  the  right  cosets  Hg  of  H  \  n  G .  When  G  is  nonabelian, 
these  need  not  coincide  with  the  left  cosets;  in  Example  2  above  with  G  =  63 
and  H  =  {(1),  (1  3)},  the  right  coset  H{\  2)  =  {(1  2),  (1  2  3)}  is  not  a  left 
coset. 
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Lemma  4.6.  If  H  is  a  subgroup  of  the  group  G,  then  any  two  left  cosets  of  H 
in  G  have  the  same  cardinality,  namely  card  H. 

Remarks.  We  shall  be  especially  interested  in  the  case  that  card  H  is  finite, 
and  then  we  write  \  H\  =  card  H  for  the  number  of  elements  in  H . 

PROOF.  If  g\H  and  g2  H  are  given,  then  the  map  g  g2g^1g  is  one-one  on 
G  and  carries  g\H  onto  g2 H .  Hence  g\H  and  y? H  have  the  same  cardinality. 
Taking  «|  =  1,  we  see  that  this  common  cardinality  is  card  H .  □ 

We  write  G/H  for  the  set  [gH]  of  all  left  cosets  of  H  in  G,  calling  it  the 
quotient  space  or  left-coset  space  of  G  by  H.  The  set  {Hg}  of  right  cosets  is 
denoted  by  H\G. 

Theorem  4.7  (Lagrange’s  Theorem).  If  G  is  a  finite  group,  then  |G|  = 
\G/H\  \H\.  Consequently  the  order  of  any  subgroup  of  G  divides  the  order 
of  G. 

Remark.  Using  the  formula  in  Theorem  4.7  three  times  yields  the  conclusion 
that  if  H  and  K  are  subgroups  of  a  finite  group  G  with  K  c  H,  then  G  /  K  = 
\G/H\  \H/K\. 

PROOF.  Lemma  4.6  shows  that  each  left  coset  has  \H\  elements.  The  left 
cosets  are  disjoint  and  exhaust  G,  and  there  are  \G/H\  left  cosets.  Thus  G  has 
\G/H\  \H\  elements.  □ 

If  a  is  an  element  of  a  group  G,  then  we  have  seen  that  the  powers  a "  of  a  form 
a  cyclic  subgroup  of  G  that  is  isomorphic  either  to  Z  or  to  some  group  Z/mZ 
for  a  positive  integer  m.  We  say  that  a  has  finite  order  m  when  the  cyclic  group 
is  isomorphic  to  Z/mZ.  Otherwise  a  has  infinite  order.  In  the  finite-order  case 
the  order  of  a  is  thus  the  least  positive  integer  n  such  that  a"  =  I . 

Corollary  4.8.  If  G  is  a  finite  group,  then  each  element  a  of  G  has  finite  order, 
and  the  order  of  a  divides  the  order  of  G. 

Proof.  The  order  of  a  equals  \H\  if  H  =  {a"  \  n  e  Z},  and  Corollary  4.8  is 
thus  a  special  case  of  Theorem  4.7.  □ 

Corollary  4.9.  If  p  is  a  prime,  then  the  only  group  of  order  /;,  up  to  isomor¬ 
phism,  is  the  cyclic  group  Cp,  and  it  has  no  subgroups  other  than  {1}  and  Cp 
itself. 

PROOF.  Suppose  that  G  is  a  finite  group  of  order  p  and  that  H  /  {1}  is  a 
subgroup  of  G.  Let  a  ^  1  be  in  //,  and  let  P  =  { a'1  |  n  £  Z}.  Since  o  /  1, 
Corollary  4.8  shows  that  the  order  of  a  is  an  integer  >  1  that  divides  p.  Since  p 
is  prime,  the  order  of  a  must  equal  p.  Then  P  =  p.  Since  P  c  //  c  G  and 
|  G |  =  p,  we  must  have  P  =  G.  □ 


2.  Quotient  Spaces  and  Homomorphisms 


131 


Let  G  i  and  G2  be  groups.  We  say  that  tp  :  G\  — >  Go  is  a  homomorphism 
if  tp(cib)  =  <p(a)(p(b)  for  all  a  and  b  in  G.  In  other  words,  <p  is  to  respect 
products,  but  it  is  not  assumed  that  tp  is  one-one  or  onto.  Any  homomorphism  <p 
automatically  respects  the  identity  and  inverses,  in  the  sense  that 

•  <p(  1)  =  1  (since  <p(  1)  =  <p(  1 1)  =  <p(l)<p(l)), 

•  tp{a~x)  =  cp{a)~x  (since  1  =  <p(l)  =  tp{aa~x)  =  < p(a)(p(a~x)  and 
similarly  1  =  <p(a-1)<p(a)). 

Examples.  The  following  functions  are  homomorphisms:  any  isomorphism, 
the  function  <p  :  Z  — >■  Z/mZ  given  by  cp{k)  =  k  mod  m,  the  function  <p  :  6„  ^ 
{±1}  given  by  cp{cr)  =  sgn cr,  the  function  cp  :  Z  — G  given  for  fixed  a  in  G  by 
tp(n)  =  a ",  and  the  function  <p  :  GL(n ,  IF)  — »■  Fx  given  by  t p(A)  =  det  A. 

The  image  of  a  homomorphism  tp  :  G\  G2  is  just  the  image  of  <p  considered 
as  a  function.  It  is  denoted  by  image  <p  =  <p(G\)  and  is  necessarily  a  subgroup  of 
G2  since  if<p(gi)  =  g2  and^^)  =  ,g^then<p(,gig,1)  =  g2g'2  and  tpig^1)  =  g2x . 

The  kernel  of  a  homomorphism  tp  :  G\  — >•  G2  is  the  set  ker<p  =  (p~x{{  1})  = 
{.r  G  G 1  |  cp(x)  =  1}.  This  is  a  subgroup  since  if  <p(x)  =  1  and  <p(y)  =  1,  then 
( p(xy )  =  cp(x)(p(y)  =  1  and<p(x_1)  =  tp{x)~l  =  1. 

The  homomorph  ism  tp  :  G\  — >  G2  is  one-one  if  and  only  if  kcx  <p  is  the  trivial 
group  {1}.  The  necessity  follows  since  1  is  already  in  ker  95,  and  the  sufficiency 
follows  since  <p(x)  =  <f(y)  implies  that  (p(xy~ 1 )  =  1  and  therefore  that  xy~x  is 
in  ker  <p. 

The  kernel  H  of  a  homomorphism  (p  :  G  \  G  2  has  the  additional  property 
of  being  a  normal  subgroup  of  G\  in  the  sense  that  ghg~l  is  in  H  whenever  g 
is  in  G\  and  h  is  in  H,  i.e.,  gHg~l  =  H.  In  fact,  if  h  is  in  ker<p  and  g  is  in  G\, 
then  (p(ghg~l )  =  (p(g)(p(h)(p(g)~l  =  (p(g)(p(g)~l  =  1  shows  that  ghg~l  is  in 
ker< p. 

Examples. 

(1)  Any  subgroup  H  of  an  abelian  group  G  is  normal  since  ghg~x  =  gg~xh  = 
h.  The  alternating  subgroup  2l„  of  the  symmetric  group  &n  is  normal  since  2t„ 
is  the  kernel  of  the  homomorphism  o  sgn  o . 

(2)  The  subgroup  H  =  {  1 ,  (1  3)}  of  63  is  not  normal  since  (1  2)H(l  2)-1  = 
{1,  (2  3)}. 

(3)  If  a  subgroup  H  of  a  group  G  has  just  two  left  cosets,  then  H  is  normal 
even  if  G  is  an  infinite  group.  In  fact,  suppose  G  =  H  U  go  H  whenever  go  is  not 
in  H.  Taking  inverses  of  all  elements  of  G,  we  see  that  G  =  H  U  Hg\  whenever 
gi  is  not  in  H.  If  g  in  G  is  given,  then  either  g  is  in  H  and  gHg~x  =  H,  or  g  is 
not  in  H  and  gH  =  Hg,  so  that  gHg~x  =  H  in  this  case  as  well. 
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Let  H  be  a  subgroup  of  G.  Let  us  look  for  the  circumstances  under  which 
G/H  inherits  a  multiplication  from  G.  The  natural  definition  is 

(glH)(g2H)  =  glg2H, 

but  we  have  to  check  that  this  definition  makes  sense.  The  question  is  whether 
we  get  the  same  left  coset  as  product  if  we  change  the  representatives  of  gi  H  and 
g2H  from  gi  and  g2  to  g\h\  and  g2h2.  Since  our  prospective  definition  makes 
{gih \ H )(g2h2H)  =  gih\g2h2H,  the  question  is  whether  g\h\g2h2H  equals 
g\g2H.  That  is,  we  ask  whether  g\h  \  g2h2  =  g\g2h  for  some  h  in  H.  If  this 
equality  holds,  then  h\g2h2  =  g2h ,  and  hence  gf 1  h2g2  equals  hhf 1 ,  which  is 
an  element  of  H.  Conversely  if  every  expression  gf 1  h2g2  is  in  H ,  then  we  can 
go  backwards  and  see  that  g\h\g2h2  =  gig2h  for  some  h  in  H ,  hence  see  that 
G/H  indeed  inherits  a  multiplication  from  G.  Thus  a  necessary  and  sufficient 
condition  for  G/H  to  inherit  a  multiplication  from  G  is  that  the  subgroup  H  is 
normal.  According  to  the  next  proposition,  the  multiplication  inherited  by  G/H 
when  this  condition  is  satisfied  makes  G/H  into  a  group. 

Proposition  4.10.  If  H  is  a  normal  subgroup  of  a  group  G,  then  G/H  becomes 
a  group  under  the  inherited  multiplication  (g\H)(g2H)  =  (g\g2)H,  and  the 
function  q  :  G  — >■  G/H  given  by  q(g)  =  gH  is  a  homomorphism  of  G  onto 
G/H  with  kernel  H.  Consequently  every  normal  subgroup  of  G  is  the  kernel  of 
some  homomorphism. 

Remarks.  When  H  is  normal,  the  group  G/H  is  called  a  quotient  group  of  G, 
and  the  homomorphism  q  :  G  — >  G/H  is  called  the  quotient  homomorphism.5 
In  the  special  case  that  G  =  Z  and  H  =  mZ,  the  construction  reduces  to  the 
construction  of  the  additive  group  of  integers  modulo  m  and  accounts  for  using 
the  notation  7L/m7L  for  that  group. 

PROOF.  The  coset  1 H  is  the  identity,  and  ( gH )-1  =  g~l H.  Also,  the  com¬ 
putation  (giHg2H)g3H  =  g\g2g3H  =  g\H(g2Hg3H)  proves  associativity. 
Certainly  q  is  onto  G/H.  It  is  a  homomorphism  since  (^1^2)  =  gigiH  = 
giHg2H  =  q(gi)q(g2).  □ 

In  analogy  with  what  was  shown  for  vector  spaces  in  Proposition  2.25,  quo¬ 
tients  in  the  context  of  groups  allow  for  the  factorization  of  certain  homomor- 
phisms  of  groups.  The  appropriate  result  is  stated  as  Proposition  4.11  and  is 
pictured  in  Figure  4. 1 .  We  can  continue  from  there  along  the  lines  of  Section  II. 5. 


5  Some  authors  call  G  / H  a  “factor  group.”  A  “factor  set,”  however,  is  something  different. 
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Proposition  4.11.  Let  (p  :  G\  — >  G2  be  a  homomorphism  between  groups,  let 
Hq  =  ker<p.  let  H  be  a  normal  subgroup  of  G\  contained  in  Hq,  and  define 
q  :  G i  — >  G\/H  to  be  the  quotient  homomorphism.  Then  there  exists  a 
homomorphism  <p  :  G\/H  — »■  G2  such  that  cp  =  tp  o  q,  i.e,  q{g\H)  =  q(g\).  It 
has  the  same  image  as  cp,  and  kerip  =  {IiqH  \  ho  £  Ho}. 

G\  — ^  G2 

Cl 

q  /  <P 

Gi/H 

FIGURE  4.1.  Factorization  of  homomorphisms  of  groups  via  the  quotient 
of  a  group  by  a  normal  subgroup. 

Remark.  One  says  that  <p  factors  through  G\/ H  or  descends  to  G\/H.  See 
Figure  4.1. 

PROOF.  We  will  have  Ip  o  q  =  q  if  and  only  if  q  satisfies  q{g\H)  =  q(gi). 
What  needs  proof  is  that  q  is  well  defined.  Thus  suppose  that  gi  and  g\  are  in  the 
same  left  coset,  so  that  g[  =  g\h  with  h  in  H.  Then  <p(g\ )  =  (p(g\)(p(h)  =  ^(^1) 
since  H  C  ker  <p,  and  <p  is  therefore  well  defined. 

The  computation  ^p{g\Hg2H)  =  <p(gig2H )  =  <p(gig2)  =  (p(gi)<p(g2)  = 
lp(g\H)lp(g2H)  shows  that  Ip  is  a  homomorphism.  Since  image  =  images,  q) 
is  onto  image  cp.  Finally  ker  ip  consists  of  all  g\H  such  that  tp(g\H)  =  1.  Since 
q(g\H)  =  tp(g  1),  the  condition  that  g\  is  to  satisfy  is  that  gi  be  in  ker <p  =  Hq. 
Hence  ker  ip  =  [hoH  \  ho  £  Ho},  as  asserted.  □ 

Corollary  4.12.  Let  <p  :  G\  — >  G2  be  a  homomorphism  between  groups,  and 
suppose  that  <p  is  onto  G2  and  has  kernel  H.  Then  <p  exhibits  the  group  G\/H  as 
canonically  isomorphic  to  G2. 

Proof.  Take  H  =  Ho  in  Proposition  4.11,  and  form  <p  :  G\/ H  — »■  G2  with 
tp  =  <p  o  q.  The  proposition  shows  that  <p  is  onto  G2  and  has  trivial  kernel,  i.e., 
the  identity  element  of  G\/H.  Having  trivial  kernel,  <p  is  one-one.  □ 

Theorem  4.13  (First  Isomorphism  Theorem).  Let  <p  :  Gj  — >■  G2  be  a 
homomorphism  between  groups,  and  suppose  that  <p  is  onto  G2  and  has  kernel 
K.  Then  the  map  H \  i->  q(H\)  gives  a  one-one  correspondence  between 

(a)  the  subgroups  H\  of  G\  containing  K  and 

(b)  the  subgroups  of  G2. 

Under  this  correspondence  normal  subgroups  correspond  to  normal  subgroups. 
If  H\  is  normal  in  G 1 ,  then  g  H  \  m*-  q(g)q{H\)  is  an  isomorphism  of  Gi/H\  onto 
G2/<p(Hi). 
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Remark.  In  the  special  case  of  the  last  statement  that  cp  :  G\  — >  G 2  is  a 
quotient  map  q  :  G  — >  G/K  and  H  is  a  normal  subgroup  of  G  containing  K ,  the 
last  statement  of  the  theorem  asserts  the  isomorphism 

G/H  =  (G/ K)  /  (H/K). 

PROOF.  The  passage  from  (a)  to  (b)  is  by  direct  image  under  cp,  and  the  passage 
from  (b)  to  (a)  will  be  by  inverse  image  under  <p~l .  Certainly  the  direct  image  of 
a  subgroup  as  in  (a)  is  a  subgroup  as  in  (b).  To  prove  the  one-one  correspondence, 
we  are  to  show  that  the  inverse  image  of  a  subgroup  as  in  (b)  is  a  subgroup  as  in 
(a)  and  that  these  two  constructions  invert  one  another. 

For  any  subgroup  //2  of  G2,  (p~l  (H2)  is  a  subgroup  of  G\.  In  fact,  if  gi  and 
g[  are  in  (p-1(/G),  we  can  write  <p(g  1)  =  ft2  and  <p(g\ )  =  h2  with  I12  and  h'2  in 
H2 .  Then  the  equations  (p{g\g\ )  =  h2h2  and  <p{g^x)  =  <p{g\)~{  =  h2l  show 
that  gigj  and  gj-1  are  in  ip_1(//2). 

Moreover,  the  subgroup  <p-1(fG)  contains  <p_1({l})  =  K.  Therefore  the 
inverse  image  under  <p  of  a  subgroup  as  in  (b)  is  a  subgroup  as  in  (a).  Since  (p  is 
a  function,  we  have  <p(<p~*  (//2))  =  Hi.  Thus  passing  from  (b)  to  (a)  and  back 
recovers  the  subgroup  of  G2. 

If  Hi  is  a  subgroup  of  G\  containing  K,  we  still  need  to  see  that  H\  = 
<p-1(<p(// 1)).  Certainly  H\  c  1  (ip(H\ )).  For  the  reverse  inclusion  let  gi  be 
in  tp~l  {(p(H\)).  Then  cp(gi)  is  in  (p{H\ ),  i.e.,  cpigi)  =  (p(h\)  for  some  h\  in  H\. 
Since  cp  is  a  homomorphism,  (p(g\  li  J 1 )  =  1.  Thus  g\h^x  is  in  ker<p  =  K ,  which 
is  contained  in  H\  by  assumption.  Then  h  \  and  gih~[l  are  in  H\ ,  and  hence  their 
product  {g\h\l)h\  =  gi  is  in  H\.  We  conclude  that  <p-1(<p(//i))  C  H\,  and  thus 
passing  from  (a)  to  (b)  and  then  back  recovers  the  subgroup  of  G\  containing  K. 

Next  let  us  show  that  normal  subgroups  correspond  to  normal  subgroups.  If  //2 
is  normal  in  G2,  let//]  be  the  subgroup  <p-1(//2)  of  G\.  For  hi  inH\  andgj  inGi, 
we  can  write  (p(h\)  =  h2  withh2  in  Hi,  and  then  <p(g\h\g^1)  =  ^(gi)h2^(gi)_1 
is  in  cp(g\ )H2(p(g\)~1  =  Hi.  Hence  gi h\ gj-1  is  in  (p~l(H2)  =  H\ .  In  the  reverse 
direction  let  H\  be  normal  in  Gi ,  and  let  g2  be  in  G2.  Since  <p  is  onto  G2,  we  can 
write  g2  =  (p(g\)  for  some  g!  in  Gi.  Theng2^(//1)gT1  =  ^(gi)<p(7/i)^(gi)_1  = 
<p(giHig^1)  =  cp(H  1).  Thus  (p{H\ )  is  normal. 

For  the  final  statement  let  //2  =  (p(H\).  We  have  just  proved  that  this  image 
is  normal,  and  hence  G1/H1  is  a  group.  The  mapping  T>  :  G  \  Gi/Hi  given 
by  <J> (gi )  =  tp(g\)Hi  is  the  composition  of  two  homomorphisms  and  hence  is  a 
homomorphism.  Its  kernel  is 

{gt  €  Gi  I  <p(g  1)  €  Hi}  =  {g!  €  Gi  I  tp{gi)  €  <p(Hi)}  =  (p-\(p{Hx)), 

and  this  equals  H\  by  the  first  conclusion  of  the  theorem.  Applying  Corollary 
4.12  to  O,  we  obtain  the  required  isomorphism  :  G\/H\  — »■  G2/^(//i).  □ 
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Theorem  4.14  (Second  Isomorphism  Theorem).  Let  H\  and  H2  be  subgroups 
of  a  group  G  with  Hi  normal  in  G.  Then  H\  (T  Hi  is  a  normal  subgroup  of  H\ ,  the 
set  H\  Hi  of  products  is  a  subgroup  of  G  with  Hi  as  a  normal  subgroup,  and  the 
map  h\(H\  fl  Hi)  hi  Hi  is  a  well-defined  canonical  isomorphism  of  groups 


Hl/(HlnH2)  =  (HlH2)/H2. 


PROOF.  The  set  H\  fl  Hi  is  a  subgroup,  being  the  intersection  of  two  subgroups. 
For h |  in  f/| ,  we  have h\(  H\  Pi  H2)h\  1  C  ht  H\hJ]  c  H |  since  Hi  is  a  subgroup 
and  h\(H]  Pi  Hi)  li  j  1  c  h  \  H2h\  1  c  Hi  since  Hi  is  normal  in  G.  Therefore 
h\(H\  fl  H2)h] j"1  c  H\  fl  Hi,  and  H\  D  Hi  is  normal  in  H\ . 

The  set  H\  Hi  of  products  is  a  subgroup  since  h  \  h2h\  h'2  =  h  \  h\  (h\  ~ 1  h2h\  )h'1 
and  since  {h\h2)~l  =  h^ihih^h^1),  and  Hi  is  normal  in  H\H2  since  H2  is 
normal  in  G. 

The  function  (p{h\{H\  fl  Hi))  =  h\H2  is  well  dehned  since  H\  fl  Hi  c  Hi, 
and  cp  respects  products.  The  domain  of  tp  is  [h\(H\  fl  Hi)  \  h\  e  Hi],  and  the 
kernel  is  the  subset  of  this  such  that  h  i  lies  in  Hi  as  well  as  H\ .  For  this  to  happen, 
h  i  must  be  in  H\  Fl  Hi,  and  thus  the  kernel  is  the  identity  coset  of  H\/{H\  fl  Hi). 
Hence  tp  is  one-one. 

To  see  that  tp  is  onto  {H\H2) / H2,  let  h\h2H2  be  given.  Then  h\{H\  Pi  Hi) 
maps  to  hi  Hi,  which  equals  h  \  h2H2.  Hence  (p  is  onto.  □ 


3.  Direct  Products  and  Direct  Sums 

We  return  to  the  matter  of  direct  products  and  direct  sums  of  groups,  direct 
products  having  been  discussed  briefly  in  Section  1 .  In  a  footnote  in  Section  II.4 
we  mentioned  a  general  principle  in  algebra  that  “whenever  a  new  systematic  con¬ 
struction  appears  for  the  objects  under  study,  it  is  well  to  look  for  a  corresponding 
construction  with  the  functions  relating  these  new  objects.”  This  principle  will 
be  made  more  precise  in  Section  1 1  of  the  present  chapter  with  the  aid  of  the 
language  of  “categories”  and  “functors.” 

Another  principle  that  will  be  relevant  for  us  is  that  constructions  in  one  context 
in  algebra  often  recur,  sometimes  in  slightly  different  guise,  in  other  contexts.  One 
example  of  the  operation  of  this  principle  occurs  with  quotients.  The  construction 
and  properties  of  the  quotient  of  a  vector  space  by  a  vector  subspace,  as  in  Section 
II. 5,  is  analogous  in  this  sense  to  the  construction  and  properties  of  the  quotient  of 
a  group  by  a  normal  subgroup,  as  in  Section  2  in  the  present  chapter.  The  need  for 
the  subgroup  to  be  normal  is  an  example  of  what  is  meant  by  “slightly  different 
guise.”  Anyway,  this  principle  too  will  be  made  more  precise  in  Section  1 1  of 
the  present  chapter  using  the  language  of  categories  and  functors. 
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Let  us  proceed  with  an  awareness  of  both  these  principles  in  connection  with 
direct  products  and  direct  sums  of  groups,  looking  for  analogies  with  what  hap¬ 
pened  for  vector  spaces  and  expecting  our  work  to  involve  constructions  with 
homomorphisms  as  well  as  with  groups. 

The  external  direct  product  61  x  Gj  was  defined  as  a  group  in  Section  1  to 
be  the  set-theoretic  product  with  coordinate-by-coordinate  multiplication.  There 
are  four  homomorphisms  of  interest  connected  with  G  i  x  G2,  namely 


i\  ■  G 1 

G, 

x  G2 

given  by 

ii(gi)  =  (g  1,  1), 

h  '■  Gs 

-►  Gi 

x  G2 

given  by 

h(gi)  =  (1,  g2), 

Pi  :  G\ 

x  G2- 

Gy 

given  by 

Pligu  gl)  =  gl. 

P2  •  G 1 

x  G2  - 

->  Gi 

given  by 

Pligl,  g2)  =  g2- 

Recall  from  the  discussion  before  Proposition  4.5  that  Proposition  2.30  for  the 
direct  product  of  two  vector  spaces  does  not  translate  directly  into  an  analog  for 
the  direct  product  of  groups;  instead  that  proposition  is  replaced  by  Proposition 
4.5,  which  involves  some  condition  of  commutativity. 

Warned  by  this  anomaly,  let  us  work  with  mappings  rather  than  with  groups 
and  subgroups,  and  let  us  use  mappings  in  formulating  a  definition  of  the  direct 
product  of  groups.  As  with  the  direct  product  of  two  vector  spaces,  the  mappings 
to  use  are  p\  and  P2  but  not  i\  and  A.  The  way  in  which  p\  and  pi  enter  is 
through  the  effect  of  the  direct  product  on  homomorphisms.  If  ip\  :  H  — >  G\ 
and  (pi  :  H  — »  Go  are  two  homomorphisms,  then  h  (cpi  (h).  (p2(h))  is  the 
corresponding  homomorphism  of  H  into  G 1  x  G2 .  In  order  to  state  matters  fully, 
let  us  give  the  definition  with  an  arbitrary  number  of  factors. 

Let  S  be  an  arbitrary  nonempty  set  of  groups,  and  let  Gs  be  the  group  cor¬ 
responding  to  the  member  .v  of  S.  The  external  direct  product  of  the  G./s 
consists  of  a  group  PLes  ^  anc*  a  system  of  group  homomorphisms.  The 
group  as  a  set  is  X  s€sG.s,  whose  elements  are  arbitrary  functions  from  S  to 
[J5€S-  Gv  such  that  the  value  of  the  function  at  s  is  in  Gv,  and  the  group  law  is 
({gsLes)  (teases)  =  tesg^eS-  The  group  homomorphisms  are  the  coordinate 
mappings  pso  :  Y\seS  Gs  ->  G.so  with  pSl]({gs}ssS)  =  gs0 •  The  individual  groups 
Gs  are  called  the  factors,  and  a  direct  product  of  n  groups  may  be  written  as 
G\  x  •  •  •  x  G„  instead  of  with  the  symbol  ]~[.  The  group  G  v  has  the  universal 

mapping  property  described  in  Proposition  4. 15  and  pictured  in  Figure  4.2. 

Proposition  4.15  (universal  mapping  property  of  external  direct  product).  Let 
{Gs  |  s  e  .S')  be  a  nonempty  set  of  groups,  and  let  ]""[  s-  Gs  be  the  external  direct 
product,  the  associated  group  homomorphisms  being  the  coordinate  mappings 
Pso  '■  I\sss  Gs  -+  GS0.  If  H  is  any  group  and  \qjs  \  s  e  5}  is  a  system  of  group 
homomorphisms  <ps  :  H  — »■  Gs ,  then  there  exists  a  unique  group  homomorphism 
<p  :  H  —>  Us€S  Gs  such  that  pS()  o  ip  =  (pSo  for  all  so  e  S. 
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Figure  4.2.  Universal  mapping  property  of  an  external  direct  product  of  groups. 


Proof.  Existence  of  disproved  by  taking  cp(h)  =  {(ps(h)}ses-  Then  pSo((p(h)) 
=  Ps0({<Ps(h)}seS )  =  (pSo(h )  as  required.  For  uniqueness  let  <p'  :  H  ->  flssS 
be  a  homomorphism  with  pSo  o  <p'  =  <pSo  for  all  .so  e  S.  For  each  h  in  H,  we  can 
write  (p'(h)  =  {(p'(h)s}seS-  For  50  in  S,  we  then  have  <pSo(h)  =  (pSo  o  <p')(h)  = 
Ps0(<p'(h))  =  tp'(h)s0,  and  we  conclude  that  < p'  =  <p.  □ 


Now  we  give  an  abstract  definition  of  direct  product  that  allows  for  the  possi¬ 
bility  that  the  direct  product  is  “internal"  in  the  sense  that  the  various  factors  are 
identified  as  subgroups  of  a  given  group.  The  definition  is  by  means  of  the  above 
universal  mapping  property  and  will  be  seen  to  characterize  the  direct  product  up 
to  canonical  isomorphism.  Let  S  be  an  arbitrary  nonempty  set  of  groups,  and  let 
Gs  be  the  group  corresponding  to  the  member  .v  of  S.  A  direct  product  of  the 
Gs ’s  consists  of  a  group  G  and  a  system  of  group  homomorphisms  ps  :  G  — »■  Gs 
for  s  e  S  with  the  following  universal  mapping  property:  whenever  H  is  a 
group  and  {<ps  \  s  e  .S’)  is  a  system  of  group  homomorphisms  tps  :  H  Gs ,  then 
there  exists  a  unique  group  homomorphism  tp  :  H  — ►  G  such  that  ps  o  =  <ps 
for  all  s  e  S.  Proposition  4.15  proves  existence  of  a  direct  product,  and  the  next 
proposition  addresses  uniqueness.  A  direct  product  is  internal  if  each  Gs  is  a 
subgroup  of  G  and  each  restriction  ps  |  is  the  identity  map. 


Gs 


<Ps 

< - 


H 


Ps  I 

G 


<p 


FIGURE  4.3.  Universal  mapping  property  of  a  direct  product  of  groups. 


Proposition  4.16.  Let  S  be  a  nonempty  set  of  groups,  and  let  Gs  be  the  group 
corresponding  to  the  member  5  of  S.  If  (G,  {ps})  and  (G'.  { /;'  })  are  two  direct 
products,  then  the  homomorphisms  ps  :  G  — >■  Gv  and  p\  :  G'  — >  Gs  are  onto 
Gs ,  there  exists  a  unique  homomorphism  <E>  :  G'  — »■  G  such  that  p\  =  ps  o  for 
all  s  e  S,  and  4>  is  an  isomorphism. 

Proof.  In  Figure  4.3  let  H  =  G'  and  (ps  =  .  If  :  G'  — >■  G  is  the 

homomorphism  produced  by  the  fact  that  G  is  a  direct  product,  then  we  have 
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ps  oO  =  /;'  for  all  s .  Reversing  the  roles  of  G  and  G' ,  we  obtain  a  homomorphism 
O'  :  G  — »■  G'  with  p's  oO'  =  ps  for  alls.  Therefore  pso(Oo<f')  =  p'oO'  =  ps. 

In  Figure  4.3  we  next  let  H  =  G  and  <ps  =  ps  for  all  s.  Then  the  identity  1  q 
on  G  has  the  same  property  ps  o  1g  =  ps  relative  to  all  ps  that  O  o  O'  has,  and  the 
uniqueness  says  that  O  o  O'  =  1g-  Reversing  the  roles  of  G  and  G',  we  obtain 
<J)'oO=  1g'-  Therefore  O  is  an  isomorphism. 

For  uniqueness  suppose  that  O  :  G'  — »■  G  is  another  homomorphism  with 
p's  =  ps  o  O  for  all  s  e  S.  Then  the  argument  of  the  previous  paragraph  shows 
thatO'oO  =  1g'-  Applying  O  on  the  left  gives  O  =  (OoO')oO  =  Oo(O'oO)  = 
O  o  1G'  =  O.  Thus  0  =  0. 

Finally  we  have  to  show  that  the  5th  mapping  of  a  direct  product  is  onto 
Gs.  It  is  enough  to  show  that  />'  is  onto  Gs.  Taking  G  as  the  external  direct 
product  rU  w't'1  Ps  equal  to  the  coordinate  mapping,  form  the  isomorphism 
O'  :  G  — »■  G'  that  has  just  been  proved  to  exist.  This  satisfies  ps  =  p\  o  O'  for 
all  s  e  S.  Since  ps  is  onto  Gs,  p's  must  be  onto  Gs.  □ 

Let  us  turn  to  direct  sums.  Part  of  what  we  seek  is  a  definition  that  allows 
for  an  abstract  characterization  of  direct  sums  in  the  spirit  of  Proposition  4.16. 
In  particular,  the  interaction  with  homomorphisms  is  to  be  central  to  the  dis¬ 
cussion.  In  the  case  of  two  factors,  we  use  i\  and  G  rather  than  p\  and  pi.  If 
(p\  :  Gi  — »■  H  and  (pi  :  G2  — »■  H  are  two  homomorphisms,  then  the  correspond¬ 
ing  homomorphism  <p  of  Gi  ©  Gi  to  H  is  to  satisfy  <p\  =  (p  o  i\  and  (pi  =  (p  oh. 
With  G\  ©  Gi  defined,  as  expected,  to  be  the  same  group  as  G 1  x  G2,  we  are  led 
to  the  formula 

<P(g  1,  82)  =  (fiig  1,  1)^(1,  gi)  =  (P\ig\)(p2ig2). 

The  images  of  commuting  elements  under  a  homomorphism  have  to  commute, 
and  hence  H  had  better  be  abelian.  Then  in  order  to  have  an  analog  of  Proposition 
4.16,  we  will  want  to  specialize  H  at  some  point  to  G\  ©  GG,  and  therefore  G\ 
and  Gi  had  better  be  abelian.  With  these  observations  in  place,  we  are  ready  for 
the  general  definition. 

Let  S  be  an  arbitrary  nonempty  set  of  abelian  groups,  and  let  Gs  be  the  group 
corresponding  to  the  member  s  of  S.  We  shall  use  additive  notation  for  the  group 
operation  in  each  Gs .  The  external  direct  sum  of  the  G  v ’s  consists  of  an  abelian 
group  0 . ,  v  G  v  and  a  system  of  group  homomorphisms  is  for  s  e  S.  The  group  is 
the  subgroup  of  UszsGs  of  all  elements  that  are  equal  to  0  in  all  but  finitely  many 
coordinates.  The  group  homomorphisms  are  the  mappings  i,0  :  GV(I  — »  0.  ,  v  Gv 
carrying  a  member  gSo  of  GV()  to  the  element  that  is  gSo  in  coordinate  ,s'o  and  is  0 
at  all  other  coordinates.  The  individual  groups  are  called  the  summands,  and 
a  direct  sum  of  n  abelian  groups  may  be  written  as  G 1  ©  •  •  •  ©  G„ .  The  group 
Gs  has  the  universal  mapping  property  described  in  Proposition  4. 17  and 
pictured  in  Figure  4.4. 
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Proposition  4.17  (universal  mapping  property  of  external  direct  sum).  Let 
{Gs  |  5  e  5}  be  a  nonempty  set  of  abelian  groups,  and  let  ®veV  Gs  be  the 
external  direct  sum,  the  associated  group  homomorphisms  being  the  embedding 
mappings  iS()  :  GSo  — >  ®ssS  Gs.  If  H  is  any  abelian  group  and  {tps  \  s  £  .S’)  is  a 
system  of  group  homomorphisms  tps  :  Gs  — >  H,  then  there  exists  a  unique  group 
homomorphism  <p  :  ®  Gs  — >  H  such  that  (p  o  iSo  =  <pSll  for  all  so  e  S. 

GS0  H 

n 

h0 1  v 

FIGURE  4.4.  Universal  mapping  property  of  an  external  direct  sum 
of  abelian  groups. 

PROOF.  Existence  of  (p  is  proved  by  taking  <p({g.s}.s€s)  =  v  <ps(gs)-  The  sum 
on  the  right  side  is  meaningful  since  the  element  {gs}J(=s  of  the  direct  sum  has 
only  finitely  many  nonzero  coordinates.  Since  H  is  abelian,  the  computation 

<p({g®ss)  +  <p({g.®es)  =  Es  <Ps(gs)  +  E.s  <Ps(g's) 

=  (<Ps(gs)  +  <Ps{g’s))  =  E,  <Ps(gs  +  gs) 

=  <p({gs  +  =  <p({^LsS  +  {^Lss) 

shows  that  tp  is  a  homomorphism.  If  gSo  is  given  and  {g.s}.ses  denotes  the  el¬ 
ement  that  is  gSo  in  the  soth  coordinate  and  is  0  elsewhere,  then  (p(iSo(gs0))  = 
<p({g.5}sss)  =  Ev  <Ps(gs),  and  the  right  side  equals  (pso(gso )  since  gs  =  0  for  all 
other  5 ’s.  Thus  tp  o  iSo  =  t pSo. 

For  uniqueness  let  tp'  :  ®^,  v  G  v  — >  H  be  a  homomorphism  with  qt’  o  iSo  =  tpSo 
for  all  .vo  £  S.  Then  the  value  of  cp'  is  determined  at  all  elements  of  ®  s  Gs  that 
are  0  in  all  but  one  coordinate.  Since  the  most  general  member  of  ®  se5  Gs  is  a 
finite  sum  of  such  elements,  tp'  is  determined  on  all  of  ®ssS  Gs.  □ 

Now  we  give  an  abstract  definition  of  direct  sum  that  allows  for  the  possibility 
that  the  direct  sum  is  “internal”  in  the  sense  that  the  various  constituents  are 
identified  as  subgroups  of  a  given  group.  Again  the  definition  is  by  means  of  a 
universal  mapping  property  and  will  be  seen  to  characterize  the  direct  sum  up  to 
canonical  isomorphism.  Let  S  be  an  arbitrary  nonempty  set  of  abelian  groups, 
and  let  Gs  be  the  group  corresponding  to  the  member  s  of  S.  A  direct  sum  of 
the  Gs  ’s  consists  of  an  abelian  group  G  and  a  system  of  group  homomorphisms 
is  :  Gv  G  for  v  £  S  with  the  following  universal  mapping  property:  when¬ 
ever  H  is  an  abelian  group  and  [tps  \  s  £  ,S'}  is  a  system  of  group  homomorphisms 
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cps  :  Gs  — >■  H,  then  there  exists  a  unique  group  homomorphism  (p  :  G  — >  H 
such  that  tp  o  is  =  <ps  for  all  s  £  S.  Proposition  4.17  proves  existence  of  a  direct 
sum,  and  the  next  proposition  addresses  uniqueness.  A  direct  sum  is  internal  if 
each  Gs  is  a  subgroup  of  G  and  each  mapping  is  is  the  inclusion  mapping. 

Gs  H 

71 

h  v 

G 

FIGURE  4.5.  Universal  mapping  property  of  a  direct  sum  of  abelian  groups. 

Proposition  4.18.  Let  S  be  a  nonempty  set  of  abelian  groups,  and  let  Gs  be 
the  group  corresponding  to  the  member  5  of  S.  If  (G,  { is })  and  (G',  {;'})  are 
two  direct  sums,  then  the  homomorphisms  is  :  Gs  — >  G  and  /'  :  Gs  — »■  G'  are 
one-one,  there  exists  a  unique  homomorphism  4>  :  G  -»  G’  such  that  f'  =  <t>  o  is 
for  all  s  £  S,  and  4>  is  an  isomorphism. 

Proof.  In  Figure  4.5  let  H  =  G'  and  tps  =  i's.  If  4>  :  G  — >■  G'  is  the 
homomorphism  produced  by  the  fact  that  G  is  a  direct  sum,  then  we  have  4>  o  is 
=  i's  for  all  s.  Reversing  the  roles  of  G  and  G',  we  obtain  a  homomorphism 
<h'  :  G'  ^  G  with  O'  o  /'  =  is  for  all  5.  Therefore  (O'  o  O)  o  is  =  O'  o  ;'  =  is. 

In  Figure  4.5  we  next  let  H  =  G  and  <ps  =  is  for  all  s.  Then  the  identity  1  q 
on  G  has  the  same  property  l^o  is  =  is  relative  to  all  is  that  O'  o  O  has,  and  the 
uniqueness  says  that  O'o  O  =  1q.  Reversing  the  roles  of  G  and  G',  we  obtain 
O  o  O'  =  1g'-  Therefore  O  is  an  isomorphism. 

For  uniqueness  suppose  that  O  :  G  — >■  G'  is  another  homomorphism  with 
/ '  =  O  o  is  for  all  s  e  S.  Then  the  argument  of  the  previous  paragraph  shows  that 
O'  o  O  =  1g-  Applying  O  on  the  left  gives  O  =  (O  o  O')  o  O  =  O  o  (O'  o  O)  = 
O  o  \q  =  O.  Thus  0  =  0. 

Finally  we  have  to  show  that  the  .Vth  mapping  of  a  direct  sum  is  one-one  on 
Gs.  It  is  enough  to  show  that is  one-one  on  Gs.  Taking  G  as  the  external  direct 
sum  0sg5  Gs  with  is  equal  to  the  embedding  mapping,  form  the  isomorphism 
O'  :  G'  — ^  G  that  has  just  been  proved  to  exist.  This  satisfies  is  =  O'  o  ;'  for  all 
s  £  S.  Since  is  is  one-one,  /'  must  be  one-one.  □ 

Example.  The  group  Qx  is  the  direct  sum  of  copies  of  Z,  one  for  each  prime, 
plus  one  copy  of  Z/2Z.  If  p  is  a  prime,  the  mapping  ip  :  Z  ^  Qx  is  given 
by  ip(n)  =  p".  The  remaining  coordinate  gives  the  sign.  The  isomorphism 
results  from  unique  factorization,  only  finitely  many  primes  being  involved  for 
any  particular  nonzero  rational  number. 
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4.  Rings  and  Fields 

In  this  section  we  begin  a  two-section  digression  in  order  to  develop  some  more 
number  theory  beyond  what  is  in  Chapter  I  and  to  make  some  definitions  as  new 
notions  arise.  In  later  sections  of  the  present  chapter,  some  of  this  material  will 
yield  further  examples  of  concrete  groups  and  tools  for  working  with  them. 

We  begin  with  the  additive  group  Z /mZ  of  integers  modulo  a  positive  integer 
to.  We  continue  to  write  [a]  for  the  equivalence  class  of  the  integer  a  when  it  is 
helpful  to  do  so.  Our  interest  will  be  in  the  multiplication  structure  that  Z/toZ 
inherits  from  multiplication  in  Z.  Namely,  we  attempt  to  define 

[a][h]  =  \ab\. 

To  see  that  this  formula  is  meaningful  in  Z/toZ,  we  need  to  check  that  the  same 
equivalence  class  results  on  the  right  side  if  the  representatives  of  [a]  and  \b\ 
are  changed.  Thus  let  [a  \  =  [ar]  and  \b\  =  [ /:/ ] .  Then  to  divides  a  —  a'  and 
b  —  /;'  and  must  divide  the  sum  of  products  ( a  —  a')b  +  a'(b  —  b')  =  ab  —  a'b' . 
Consequently  [ah]  =  [a'b1],  and  multiplication  is  well  defined.  If  x  and  y  are  in 
Z/toZ,  their  product  is  often  denoted  by  xy  mod  to. 

The  same  kind  of  argument  as  just  given  shows  that  the  associativity  of  multi¬ 
plication  in  Z  and  the  distributive  laws  imply  corresponding  facts  about  Z/toZ. 
The  result  is  that  Z/toZ  is  a  “commutative  ring  with  identity”  in  the  sense  of  the 
following  definitions. 

A  ring  is  a  set  R  with  two  operations  R  x  R  — »■  R,  usually  called  addition 
and  multiplication  and  often  denoted  by  (a,  b)  a  +  b  and  (a,  b)  i->  ab,  such 
that 

(i)  R  is  an  abelian  group  under  addition, 

(ii)  multiplication  is  associative  in  the  sense  that  a  (he)  =  (ab)c  for  all  a,  b.  c 
in  R, 

(iii)  the  two  distributive  laws 

a(h  +  c)  =  (ah)  +  (ac)  and  (h  +  c)a  =  ( ba )  +  (ca) 

hold  for  all  a,  h,  c  in  R. 

The  additive  identity  is  denoted  by  0,  and  the  additive  inverse  of  a  is  denoted  by 
— a .  A  sum  a  +  (— h)  is  often  abbreviated  a—b.  By  convention  when  parentheses 
are  absent,  multiplications  are  to  be  carried  out  before  additions  and  subtractions. 
Thus  the  distributive  laws  may  be  rewritten  as 

a(b  +  c)  =  ab  +  ac  and  (h  +  c)a  =  ba  +  ca. 

A  ring  R  is  called  a  commutative  ring  if  multiplication  satisfies  the  commutative 
law 

(iv)  ab  =  ba  for  all  a  and  bin  R. 
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A  ring  R  is  called  a  ring  with  identity6  if  there  exists  an  element  1  such  that 
la  =  al  =  a  for  all  a  in  R.  It  is  immediate  from  the  definitions  that 

•  Oa  =  0  and  aO  =  0  in  any  ring  (since,  in  the  case  of  the  first  formula, 
0  =  Oa  —  Oa  =  (0  +  0)a  —  0a  =  0a  +  0a  —  Oa  =  Oa), 

•  the  multiplicative  identity  is  unique  in  a  ring  with  identity  (since  1'  = 

l'l  =  1), 

•  (— l)a  =  —a  =  a(— 1)  in  any  ring  with  identity  (partly  since  0  =  0 a  = 
(1  +  (— l))o  =  la  +  (— l)fl  =  a  +  (— l)u). 

In  a  ring  with  identity,  it  will  be  convenient  not  to  insist  that  the  identity  be 
different  from  the  zero  element  0.  If  1  and  0  do  happen  to  coincide  in  R,  then  it 
readily  follows  that  0  is  the  only  element  of  R,  and  R  is  said  to  be  the  zero  ring. 

The  set  Z  of  integers  is  a  basic  example  of  a  commutative  ring  with  identity. 
Returning  to  Z/mZ,  suppose  now  that  m  is  a  prime  p.  If  [a]  is  in  Z/pZ  with  a 
in  {1,  2, . . . ,  p  —  1},  then  GCD(u,  p)  =  I  and  Proposition  1.2  produces  integers 
r  and  s  with  ar  +  ps  =  1.  Modulo  p,  this  equation  reads  [a  ]|r  j  =  [1],  In  other 
words,  [r]  is  a  multiplicative  inverse  of  [a].  The  result  is  that  Z/pZ,  when  p  is  a 
prime,  is  a  “field”  in  the  sense  of  the  following  definition. 

A  field  F  is  a  commutative  ring  with  identity  such  that  F  /  0  and  such  that 
(v)  to  each  a  ^  0  in  F  corresponds  an  element  a~x  in  F  such  that  aa~x  =  1 . 

In  other  words,  Fx  =  F  —  {0}  is  an  abelian  group  under  multiplication.  Inverses 
are  necessarily  unique  as  a  consequence  of  one  of  the  properties  of  groups. 

When  p  is  prime,  we  shall  write  Fp  for  the  field  Z/ p7L.  Its  multiplicative 
group  F*  has  order  p  —  1 ,  and  Lagrange’s  Theorem  (Corollary  4.8)  immediately 
implies  that  ap~l  =  1  mod  p  whenever  a  and  p  are  relatively  prime.  This  result 
is  known  as  Fermat’s  Little  Theorem.7 

For  general  m,  certain  members  of  Z / m Z  have  multiplicative  inverses.  The 
product  of  two  such  elements  is  again  one,  and  the  inverse  of  one  is  again  one. 
Thus,  even  though  Z//wZ  need  not  be  a  field,  the  subset  (Z/mZ)x  of  members 
of  Z/wZ  with  multiplicative  inverses  is  a  group.  The  same  argument  as  when  m 
is  prime  shows  that  the  class  of  a  has  an  inverse  if  and  only  if  GCD(a.  m )  =  1. 
The  number  of  such  classes  was  defined  in  Chapter  I  in  terms  of  the  Euler  <p 
function  as  <p(m),  and  a  formula  for  <p(m)  was  obtained  in  Corollary  1.10.  The 

6Some  authors,  particularly  when  discussing  only  algebra,  find  it  convenient  to  incorporate  the 
existence  of  an  identity  into  the  definition  of  a  ring.  However,  in  real  analysis  some  important  natural 
rings  do  not  have  an  identity,  and  the  theory  is  made  more  complicated  by  forcing  an  identity  into 
the  picture.  For  example  the  space  of  integrable  functions  on  M  forms  a  very  natural  ring,  with 
convolution  as  multiplication,  and  there  is  no  identity;  forcing  an  identity  into  the  picture  in  such 
a  way  that  the  space  remains  stable  under  translations  makes  the  space  large  and  unwieldy.  The 
distinction  between  working  with  rings  and  working  with  rings  with  identity  will  be  discussed  further 
in  Section  1 1 . 

7As  opposed  to  Fermat's  Last  Theorem,  which  lies  deeper. 
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conclusion  is  that  (Z /mZ)x  is  an  abelian  group  of  order  (p (m ) .  Application  of 
Lagrange’s  Theorem  yields  Euler's  generalization  of  Fermat’s  Little  Theorem, 
namely  that  =  1  mod  m  for  every  positive  integer  m  and  every  integer  a 
relatively  prime  to  m . 

More  generally,  in  any  ring  R  with  identity,  a  unit  is  defined  to  be  any  element 
a  such  that  there  exists  an  element  a~l  with  aa~]  =  a~l a  =  1.  The  element  a-1 
is  unique  if  it  exists8  and  is  called  the  multiplicative  inverse  of  a.  The  units  of  R 
form  a  group  denoted  by  Rx.  For  example  the  group  Zx  consists  of  +1  and  —  1, 
and  the  zero  ring  R  has  Rx  =  {0}.  If  R  is  a  nonzero  ring,  then  0  is  not  in  Rx . 

Flere  are  some  further  examples  of  fields. 

Examples  of  fields. 

(1)  Q,  R,  and  C.  These  are  all  fields. 

(2)  Q[0],  This  was  introduced  between  Examples  8  and  9  of  Section  1.  It 
is  assumed  that  0  is  a  complex  number  and  that  there  exists  an  integer  n  >  0 
such  that  the  complex  numbers  1 ,  0,  02, . . . ,  0"  are  linearly  dependent  over  Q. 
The  set  Q[0]  is  defined  to  be  the  linear  span  over  Q  of  all  powers  1 . 0.  02, . . .  of 
9,  which  is  the  same  as  the  linear  span  of  the  finite  set  1,  9,  92, . . . ,  9"~l.  The 
set  <Q>[0]  was  shown  in  Proposition  4. 1  to  be  a  subset  of  C  that  is  closed  under 
the  arithmetic  operations,  including  the  passage  to  reciprocals  in  the  case  of  the 
nonzero  elements.  It  is  therefore  a  field. 

(3)  A  field  of  4  elements.  LetF4  =  {0,  1,  9,  0  +  1},  where  9  is  some  symbol  not 
standing  for  0  or  1 .  Define  addition  in  F4  and  multiplication  in  F£  by  requiring 
that  a  +  0  =  0  +  a  =  a  for  all  a,  that 


1  +  1 

=  0, 

1 

+ 

0  = 

=  (0  + 

1), 

1  +  (0  + 

1) 

=  0, 

0  +  1 

=  (0  +  1), 

0 

+ 

0  = 

=  0, 

0  +  (0  + 

1) 

=  1, 

(9  +  1)  +  1 

=  0, 

(0  +  1) 

+ 

0  = 

=  l, 

(0  + 

1)  +  (0  + 

1) 

=  0, 

and  that 

11  = 

=  1, 

10 

= 

0, 

K0 

+  1)  = 

=  (0 

+ 

1), 

9 1  = 

=  9, 

00 

= 

(0  +  D, 

0(0 

+  1)  = 

=  1, 

(0  +  1)1  = 

=  (0  +  D, 

(0  +  1)0 

= 

1, 

(0  + 1)(0 

+  1)  = 

=  0. 

The  result  is  a  field.  With  this  direct  approach  a  certain  amount  of  checking  is 
necessary  to  verify  all  the  properties  of  a  field.  We  shall  return  to  this  matter  in 
Chapter  IX  when  we  consider  finite  fields  more  generally,  and  we  shall  then  have 
a  way  of  constructing  F4  that  avoids  tedious  checking. 

8In  fact,  if  b  and  c  exist  with  ab  =  ca  =  1,  then  a  is  a  unit  with  a~l  =  b  =  c  because 
b  =  lb  =  ( ca)b  =  c(ab)  =  cl  =  c. 
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In  analogy  with  the  theory  of  groups,  we  define  a  subring  of  a  ring  to  be  a 
nonempty  subset  that  is  closed  under  addition,  negation,  and  multiplication.  The 
set  2Z  of  even  integers  is  a  subring  of  the  ring  Z  of  integers.  A  subfield  of  a  field  is 
a  subset  containing  0  and  1  that  is  closed  under  addition,  negation,  multiplication, 
and  multiplicative  inverses  for  its  nonzero  elements.  The  set  Q  of  rationals  is  a 
subfield  of  the  field  R  of  reals. 

Intermediate  between  rings  and  fields  are  two  kinds  of  objects  — integral  do¬ 
mains  and  division  rings— that  arise  frequently  enough  to  merit  their  own  names. 

The  setting  for  the  first  is  a  commutative  ring  R.  A  nonzero  element  a  of 
R  is  called  a  zero  divisor  if  there  is  some  nonzero  b  in  R  with  ah  =  0.  For 
example  the  element  2  in  the  ring  Z/6Z  is  a  zero  divisor  because  2-3  =0. 
An  integral  domain  is  a  nonzero  commutative  ring  with  identity  having  no  zero 
divisors.  Fields  have  no  zero  divisors  since  if  a  and  b  are  nonzero,  then  ab  =  0 
would  force  b  =  lb  =  ( a~la)b  =  a~l(ab )  =  a-10  =  0  and  would  give  a 
contradiction;  therefore  every  field  is  an  integral  domain.  The  ring  of  integers 
Z  is  another  example  of  an  integral  domain,  and  the  polynomial  rings  Q[X]  and 
R[X]  and  C[  X  ]  introduced  in  Section  1.3  are  further  examples.  A  cancellation 
law  for  multiplication  holds  in  any  integral  domain: 

ab  =  ac  with  a  /  0  implies  b  =  c. 

In  fact,  ab  =  ac  implies  a(b  —  c )  =  0;  since  a  0,b  —  c  must  be  0. 

The  other  object  with  its  own  name  is  a  division  ring,  which  is  a  nonzero  ring 
with  identity  such  that  every  nonzero  element  is  a  unit.  The  commutative  division 
rings  are  the  fields,  and  we  have  encountered  only  one  noncommutative  division 
ring  so  far.  That  is  the  set  H  of  quaternions,  which  was  introduced  in  Section  1 . 
Division  rings  that  are  not  fields  will  play  only  a  minor  role  in  this  book  but  are 
of  great  interest  in  Chapters  II  and  III  of  Advanced  Algebra. 

Let  us  turn  to  mappings.  A  function  <p  :  R  — >■  R'  between  two  rings  is  an 
isomorphism  of  rings  if  <p  is  one-one  onto  and  satisfies  t p(a  +  b)  =  <p(a  )  +  <p(b) 
and  tp{ab)  =  <p (a )<p(b)  for  all  a  and  b  in  R.  In  other  words,  <p  is  to  be  an 
isomorphism  of  the  additive  groups  and  to  satisfy  (p(ab)  =  (p(a)<p(b).  Such  a 
mapping  carries  the  identity,  if  any,  in  R  to  the  identity  of  R' .  The  relation  “is 
isomorphic  to”  is  an  equivalence  relation.  Common  notation  for  an  isomorphism 
of  rings  is  R  =  R':  because  of  the  symmetry,  one  can  say  that  R  and  R'  are 
isomorphic. 

A  function  cp  :  R  — »■  R'  between  two  rings  is  a  homomorphism  of  rings  if  (p 
satisfies  tp(a  +  b)  =  tp{a)  +  cp(b)  and  tp{ab)  =  <p{a)(p{b)  for  all  a  and  b  in  R. 
In  other  words,  tp  is  to  be  a  homomorphism  of  the  additive  groups  and  to  satisfy 
(p{ab)  =  <p(a)(p(b). 

Examples  of  homomorphisms  of  rings. 

(1)  The  mapping  tp  :  Z  — >■  Z/mZ  given  by  cp{k)  =  k  mod  m. 
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(2)  The  evaluation  mapping  ip  :  R[X]  — >  R  given  by  P(X)  i->  P(r )  for  some 
fixed  r  in  R. 

(3)  Mappings  with  the  direct  product  Z  x  Z.  The  additive  group  ZxZ  becomes 
a  commutative  ring  with  identity  under  coordinate-by-coordinate  multiplication, 
namely  (a,  a')  +  (b,  b')  =  (a  +  b,  a'  +  b').  The  identity  is  (1,  1).  Projection 
( a ,  a!)  i — >■  a  to  the  first  coordinate  is  a  homomorphism  of  rings  ZxZ->Z  that 
carries  identity  to  identity.  Inclusion  ah-  (a.  0)  of  Z  into  the  first  coordinate  is 
a  homomorphism  of  rings  Z->ZxZ  that  does  not  carry  identity  to  identity.9 

Proposition  4.19.  If  R  is  a  ring  with  identity  1  R,  then  there  exists  a  unique 
homomorphism  of  rings  :  Z  — »■  R  such  that  ^>(1)  =  1«. 

PROOF.  The  formulas  for  manipulating  exponents  of  an  element  in  a  group, 
when  translated  into  the  additive  notation  for  addition  in  R ,  say  that  n  h-»  nr 
satisfies  (m  +  n)r  =  mr  +  nr  and  ( mn)r  =  m(nr )  for  all  r  in  R  and  all 
integers  m  and  n.  The  first  of  these  formulas  implies,  for  any  r  in  R.  that 
c pr(n )  =  nr  is  a  homomorphism  between  the  additive  groups  of  Z  and  R,  and 
it  is  certainly  uniquely  determined  by  its  value  for  n  =  1.  The  distributive 
laws  imply  that  \jr r(r ')  =  r'r  is  another  homomorphism  of  additive  groups. 
Hence  if/r  o  <pri  and  cpr>r  are  homomorphisms  between  the  additive  groups  of 
Z  and  R.  Since  (iff,-  o  (pr>)(  1)  =  i/vlr')  =  r'r  =  (pr>r(  1),  we  must  have 
(i jsr  o  (pr’)Qn )  =  ( pr'r(m)  for  all  integers  m.  Thus  ( mr')r  =  ni(r'r )  for  all 
m.  Putting  r  =  nig  and  r'  =  I «  proves  the  fourth  equality  of  the  computation 

c pi(mn )  =  (mn)lg  =  m(«l/e) 

=  m(lR(nlR))  =  (mlR)(nlR)  =  (pi(m)(pi(n), 

and  shows  that  <p\  is  in  fact  a  homomorphism  of  rings.  □ 

The  image  of  a  homomorphism  <p  :  R  — »■  f?'  of  rings  is  a  subring  of  A'',  as  is 
easily  checked.  The  kernel  turns  out  to  be  more  than  just  of  subring  of  R.  If  a 
is  in  the  kernel  and  b  is  any  element  of  R,  then  <p{ab)  =  (p{a)(p(b)  =  ()<p(b)  =  0 
and  similarly  (p(ba )  =  0.  Thus  the  kernel  of  a  ring  homomorphism  is  closed 
under  products  of  members  of  the  kernel  with  arbitrary  members  of  R.  Adapting 
a  definition  to  this  circumstance,  one  says  that  an  ideal  1  of  R  (or  two-sided 
ideal  in  case  of  ambiguity)  is  an  additive  subgroup  such  that  ab  and  ba  are  in  / 
whenever  a  is  in  /  and  h  is  in  R.  Briefly  then,  the  kernel  of  a  homomorphism  of 
rings  is  an  ideal. 

Conversely  suppose  that  I  is  an  ideal  in  a  ring  R.  Since  I  is  certainly  an 
additive  subgroup  of  an  abelian  group,  we  can  form  the  additive  quotient  group 

9  Sometimes  authors  who  build  the  existence  of  an  identity  into  the  definition  of  "ring”  insist  as 
a  matter  of  definition  that  homomorphisms  of  rings  carry  identity  to  identity.  Such  authors  would 
then  exclude  this  particular  mapping  from  consideration  as  a  homomorphism. 
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R/ 1 .  It  is  customary  to  write  the  individual  cosets  in  additive  notation,  thus  as 
r  +  7.  In  analogy  with  Proposition  4.10,  we  have  the  following  result  for  the 
present  context. 

Proposition  4.20.  If  7  is  an  ideal  in  a  ring  R,  then  a  well-defined  operation 
of  multiplication  is  obtained  within  the  additive  group  R/I  by  the  definition 
(ri  +  7)(r2  +  7)  =  /'i  /'2  +  7,  and  R/I  becomes  a  ring.  If  R  has  an  identity  1,  then 
1  +  7  is  an  identity  in  R/I .  With  these  definitions  the  function  q  :  R  — >■  R/I 
given  by  q(r)  =  r  +  7  is  a  ring  homomorphism  of  R  onto  R/I  with  kernel  7. 
Consequently  every  ideal  of  R  is  the  kernel  of  some  homomorphism  of  rings. 

Remarks.  When  7  is  an  ideal,  the  ring  R/I  is  called  a  quotient  ring10  of  7?, 
and  the  homomorphism  q  :  R  — >  R/I  is  called  the  quotient  homomorphism. 
In  the  special  case  that  R  =  Z  and  7  =  mZ,  the  construction  of  R/I  reduces  to 
the  construction  of  Z/mZ  as  a  ring  at  the  beginning  of  this  section. 

PROOF.  If  we  change  the  representatives  of  the  cosets  from  r\  and  r2  to  r\+i\ 
and  r2  +  h  with  i\  and  i2  in  7,  then  ( n  +  /' i ) (r2  +  i2)  =  r\r2  +  OV2  +  r\i2  +  i\h) 
is  in  r\r2  + 1  by  the  closure  properties  of  7.  Hence  multiplication  is  well  defined. 

The  associativity  of  this  multiplication  follows  from  the  associativity  of  mul¬ 
tiplication  in  R  because 

(Oh  +  7)(r  2  +  I))(r3  +  7)  =  ( rxr2  +  7)(r3  +  7)  =  irlr2)r3  +  7  =  r,  (r2r3)  +  7 

=  Oh  +  /)(r2r3  +  7)  =  Oh  +  /)(0-2  +  /)(r3  +  7)). 

Similarly  the  computation 

Oh  +  f)((r2  +  7)  +  (r3  +  7))  =  r\{r2  +  r3)  +  7  =  (nr2  +  r\r3)  +  7 

=  Oh  +  7)0-2  +  7)  +  Oh  +  7)0-3  +  7) 

yields  one  distributive  law,  and  the  other  distributive  law  is  proved  in  the  same 
way.  If  R  has  an  identity  1,  then  (1  +  7)(r  +  7)  =  lr  +  7  =  r  +  7  and 
(r  +  7) ( 1  +7)  =  rl  +  7  =  r  +  7  show  that  1  +  7  is  an  identity  in  R/I. 

Finally  we  know  that  the  quotient  map  q  :  R  R/I  is  a  homomorphism  of 
additive  groups,  and  the  computation  q{r\r2)  =  r \ r2  +  7  =  Oh  +  7 )(r2  +  7)  = 
q{r\)q{r2)  shows  that  q  is  a  homomorphism  of  rings.  □ 

Examples  of  ideals. 

(1)  The  ideals  in  the  ring  Z  coincide  with  the  additive  subgroups  and  are  the 
sets  mZ;  the  reason  each  wZ  is  an  ideal  is  that  if  a  and  b  are  integers  and  m 
divides  a ,  then  m  divides  ab. 

10Quotient  rings  are  known  also  as  “factor  rings.”  A  "ring  of  quotients,"  however,  is  something 
different. 
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(2)  The  ideals  in  a  field  IF  are  0  and  F  itself,  no  others;  in  fact,  if  a  0  is  in 
an  ideal  and  b  is  in  F,  then  the  equality  b  =  (bcC 1  )a  shows  that  b  is  in  the  ideal 
and  that  the  ideal  therefore  contains  all  elements  of  F. 

(3)  If  7?  is  Q[X]  orM[X]  or  C[X],  then  every  ideal  I  is  of  the  form  7  =  Rf{X ) 
for  some  polynomial  f(X).  In  fact,  we  can  take  f(X)  =  0  if  I  =  0.  If  I  ^  0, 
let  f{X)  be  a  nonzero  member  of  7  of  lowest  possible  degree.  If  A(X)  is  in  7, 
then  Proposition  1.12  shows  that  A{X)  =  f(X)B(X)  +  C(X)  withC(X)  =0or 
degC  <  deg  /.  The  equality  C(X)  =  A(X)  —  f(X)B(X)  shows  that  C(X)  is  in 
7,  and  the  minimality  of  deg/  implies  that  C{X)  =  0.  Thus  A(X)  =  f(X)B(X). 

(4)  In  a  ring  R  with  identity  1 ,  an  ideal  /  is  a  proper  subset  of  R  if  and  only  if  1 
is  not  in  7.  In  fact,  7  is  certainly  a  proper  subset  if  1  is  not  in  7.  In  the  converse 
direction  if  1  is  in  7,  then  every  element  r  =  r  1,  for  r  in  7?,  lies  in  7.  Hence 
7  =  7?,  and  7  is  not  a  proper  subset. 


In  analogy  with  what  was  shown  for  vector  spaces  in  Proposition  2.25  and 
for  groups  in  Proposition  4.11,  quotients  in  the  context  of  rings  allow  for  the 
factorization  of  certain  homomorphisms  of  rings.  The  appropriate  result  is  stated 
as  Proposition  4.21  and  is  pictured  in  Figure  4.6. 


Proposition  4.21.  Let  ip  :  7?i  — >■  7?2  be  a  homomorphism  of  rings,  let  7o  = 
ker  <p,  let  7  be  an  ideal  of  7?  i  contained  in  7o,  and  let  q  :  7?i  — >  7?  \/l  be  the  quotient 
homomorphism.  Then  there  exists  a  homomorphism  of  rings  <p  :  7?i/7  — »■  Ro 
such  that  (p  =  cp  o  q,  i.e.,  <p(ri  +  7)  =  <p(r\).  It  has  the  same  image  as  cp,  and 
kerip  =  {r  +  7  |  r  e  70}. 


7?,  7? 2 

q  /  ip 

R\/l 

FIGURE  4.6.  Factorization  of  homomorphisms  of  rings  via  the  quotient 
of  a  ring  by  an  ideal. 

Remark.  One  says  that  <p  factors  through  7?]/7  or  descends  to  7?i/7. 

Proof.  Proposition  4.11  shows  that  <p  descends  to  a  homomorphism  of 
the  additive  group  of  R\/ 1  into  the  additive  group  of  7?2  and  that  all  the  other 
conclusions  hold  except  possibly  for  the  fact  that  <p  respects  multiplication.  To 
see  that  <p  respects  multiplication,  we  just  compute  that  <p((r  +  7)(r'  +  7))  = 
y{rr'  +  7)  =  <p(rr')  =  cp{r)(p{r')  =  ~ip(r  +  7)^(r'  +  7).  □ 
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An  example  of  special  interest  occurs  when  <p  is  a  homomorphism  of  rings 
ip  :  Z  — >  R  and  the  ideal  mZ  of  Z  is  contained  in  the  kernel  of  (f.  Then  the 
proposition  says  that  tp  descends  to  a  homomorphism  of  rings  <p  :  Z/mZ  — >  R. 
We  shall  make  use  of  this  result  shortly.  But  first  let  us  state  a  different  special 
case  as  a  corollary. 

Corollary  4.22.  Let  (p  :  R i  — »■  AS  be  a  homomorphism  of  rings,  and  suppose 
that  ip  is  onto  Rs  and  has  kernel  7.  Then  ip  exhibits  the  ring  R\ /I  as  canonically 
isomorphic  to  AS. 

PROOF.  Take  7  =  70  in  Proposition  4.21,  and  form  tp  :  R\ / 1  — >  Ri  with 
ip  =  Ip  o  q.  The  proposition  shows  that  ip  is  onto  A?  and  has  trivial  kernel,  i.e., 
the  identity  element  of  R\ / 1 ■  Having  trivial  kernel,  ip  is  one-one.  □ 

Proposition  4.23.  Any  field  IF  contains  a  subfield  isomorphic  to  the  rationals 
Q  or  to  some  field  Fp  with  p  prime. 

Remarks.  The  subfield  in  the  proposition  is  called  the  prime  field  of  IF.  The 
characteristic  of  F  is  defined  to  be  0  if  the  prime  field  is  isomorphic  to  Q  and  to 
be  p  if  the  prime  field  is  isomorphic  to  Fp. 

Proof.  Proposition  4.19  produces  a  homomorphism  of  rings  ip\  :  Z  — >  F 
with  <pi(l)  =  1.  The  kernel  of  <p\  is  an  ideal,  necessarily  of  the  form  m Z  with 
m  an  integer  >  0,  and  the  image  of  <p\  is  a  commutative  subring  with  identity  in 
F.  Let  ipx  :  Z / m Z  -a-  F  be  the  descended  homomorphism  given  by  Proposition 
4.21.  The  integer  m  cannot  factor  nontrivially,  say  as  m  =  rs ,  because  otherwise 
ip]  (r)  and  ipl  (s)  would  be  nonzero  members  of  F  with  ip\  {r)ip\  (s)  =  ip\{rs)  = 
ip  i  (0)  =  0,  in  contradiction  to  the  fact  that  a  field  has  no  zero  divisors. 

Thus  m  is  prime  or  m  is  0.  If  m  is  a  prime  p,  then  Z/ p7L  is  a  field,  and  the 
image  of  <p\  is  the  required  subfield  of  F.  Thus  suppose  that  m  =  0.  Then  <p\ 
is  one-one,  and  F  contains  a  subring  with  identity  isomorphic  to  Z.  Define  a 
function  4),  :  Q  — >  F  by  saying  that  if  k  and  /  are  integers  with  /  jk  0,  then 
<Li (k/_1)  =  ip\{k)ip\{l)~x .  This  is  well  defined  because  <pi(7)  /  0  and  because 
k\l\X  =  kilj1  implies  k\h  =  kil\  and  hence  tp\(k\)ip\(h)  =  ^i(fe)<Pi(/i)  and 
ip\(k\)ip\{l\)~x  =  ipi(k2)(p\(h)~1  ■  We  readily  check  that  4>  i  is  a  homomorphism 
with  kernel  0.  Then  F  contains  the  subfield  4>i  (Q)  isomorphic  to  Q.  □ 


5.  Polynomials  and  Vector  Spaces 


In  this  section  we  complete  the  digression  begun  in  Section  4.  We  shall  be  using 
the  elementary  notions  of  rings  and  fields  established  in  Section  4  in  order  to 
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work  with  (i)  polynomials  over  any  commutative  ring  with  identity  and  (ii)  vector 
spaces  over  arbitrary  fields. 

It  is  an  important  observation  that  a  good  deal  of  what  has  been  proved  so 
far  in  this  book  concerning  polynomials  when  ¥  is  Q  or  R  or  C  remains  valid 
when  ¥  is  any  field.  Specifically  all  the  results  in  Section  1.3  through  Theorem 
1 . 17  on  the  topic  of  polynomials  in  one  indeterminate  remain  valid  as  long  as  the 
coefficients  are  from  a  field.  The  theory  breaks  down  somewhat  when  one  tries  to 
extend  it  by  allowing  coefficients  that  are  not  in  a  field  or  by  allowing  more  than 
one  indeterminate.  Because  of  this  circumstance  and  because  we  have  not  yet 
announced  a  universal  mapping  property  for  polynomial  rings  and  because  we 
have  not  yet  addressed  the  several-variable  case,  we  shall  briefly  review  matters 
now  while  extending  the  reach  of  the  theory  that  we  have. 

Let  R  be  a  nonzero  commutative  ring  with  identity,  so  that  1  7^  0.  A  polynomial 
in  one  indeterminate  is  to  be  an  expression  P(X)  =  an  X"  +  •  •  ■+CI2X2  +a\  X+ao 
in  which  X  is  a  symbol,  not  a  variable.  Nevertheless,  the  usual  kinds  of  ma¬ 
nipulations  with  polynomials  are  to  be  valid.  This  description  lacks  precision 
because  X  has  not  really  been  defined  adequately.  To  make  a  precise  definition, 
we  remove  X  from  the  formalism  and  simply  define  the  polynomial  to  be  the 
tuple  (ao,  a\, . . . ,  a„,  0,  0, . . . )  of  its  coefficients.  Thus  a  polynomial  in  one 
indeterminate  with  coefficients  in  R  is  an  infinite  sequence  of  members  of  R 
such  that  all  terms  of  the  sequence  are  0  from  some  point  on.  The  indexing  of  the 
sequence  is  to  begin  with  0,  and  X  is  to  refer  to  the  polynomial  (0,  1, 0,  0, ... ). 
We  may  refer  to  a  polynomial  P  as  P(X)  if  we  want  to  emphasize  that  the 
indeterminate  is  called  X.  Addition  and  negation  of  polynomials  are  defined  in 
coordinate-by-coordinate  fashion  by 

(o0-  a\ - -  an ,  0,  0 - )  +  (ho,/?! - -  h„,  0,  0, . . .) 

=  (no  +  bo  -  fli  +  b\ ,  . . . ,  an  +  b„,  0,  0, . . . ), 
-(a0,  au  ■  ■  • ,  an,  0,  0, . . .)  =  (-a0,  — ai, . . . ,  — a„,  0,  0, . . . ), 

and  the  set  /?[X]  of  polynomials  is  then  an  abelian  group  isomorphic  to  the  direct 
sum  of  infinitely  many  copies  of  the  additive  group  of  R.  As  in  Section  1.3,  Xn 
is  to  be  the  polynomial  whose  coefficients  are  1  in  the  nth  position,  with  n  >  0, 
and  0  in  all  other  positions.  Polynomial  multiplication  is  then  defined  so  as  to 
match  multiplication  of  expressions  a„Xn  +  ■  ■  ■  +  a\X  +  ao  if  the  product  is 
expanded  out,  powers  of  X  are  added,  and  the  terms  containing  like  powers  of  X 
are  collected.  Thus  the  precise  definition  is  that 

(a0,ai, - 0,0, .  ..)(h0,hi, - 0,0,...)  =  (c0,ci,  ...,0,0 - ), 

where  cn  =  Y2k=o  aki>N-k-  It  is  a  simple  matter  to  check  that  this  multiplication 
makes  A'[  X  |  into  a  commutative  ring. 
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The  polynomial  with  all  entries  0  is  denoted  by  0  and  is  called  the  zero 
polynomial.  For  all  polynomials  P  =  (ao, . . . ,  an,  0, . . . )  other  than  0,  the 
degree  of  P,  denoted  by  deg  P,  is  defined  to  be  the  largest  index  n  such  that 
an  ■=/=■  0.  In  this  case,  an  is  called  the  leading  coefficient,  and  anX"  is  called  the 
leading  term;  if  a„  =  1,  the  polynomial  is  called  monic.  The  usual  convention 
with  the  0  polynomial  is  either  to  leave  its  degree  undefined  or  to  say  that  the 
degree  is  —  oo;  let  us  follow  the  latter  approach  in  this  section  in  order  not  to  have 
to  separate  certain  formulas  into  cases. 

There  is  a  natural  one-one  homomorphism  of  rings  l  :  R  — »■  f?[A]  given  by 
i(c)  =  (c,  0,  0, ... )  for  c  in  R.  This  sends  the  identity  of  R  to  the  identity  of 
f?[A].  Thus  we  can  identify  R  with  the  constant  polynomials,  i.e.,  those  of 
degree  <  0. 

If  P  and  Q  are  nonzero  polynomials,  then 

deg(P  +  Q)  <  max  (deg  P,  deg  Q). 


In  this  formula  equality  holds  if  deg  P  ^  deg  Q.  In  the  case  of  multiplication,  let 
P  and  Q  have  respective  leading  terms  amX'"  and  bnX" .  All  the  coefficients  of 
P  Q  are  0  beyond  the  (m  +  /z)lh,  and  the  (m  +n)th  is  ambn.  This  in  principle  could 
be  0  but  is  nonzero  if  R  is  an  integral  domain.  Thus  P  and  Q  nonzero  implies 


deg(PQ) 


<  deg  P  +  deg  Q 
=  deg  P  +  deg  Q 


for  general  R. 

if  R  is  an  integral  domain. 


It  follows  in  particular  that  A[  A  |  is  an  integral  domain  if  R  is. 

Normally  we  shall  write  out  specific  polynomials  using  the  informal  notation 
with  powers  of  A,  using  the  more  precise  notation  with  tuples  only  when  some 
ambiguity  might  otherwise  result. 

In  the  special  case  that  R  is  a  field.  Section  1.3  introduced  the  notion  of 
evaluation  of  a  polynomial  P  ( A)  at  a  point  r  in  the  field,  thus  providing  a  mapping 
P(  A)  P(r)  from  P[  A]  to  R  for  each  r  in  R.  We  listed  a  number  of  properties 
of  this  mapping,  and  they  can  be  summarized  in  our  present  language  by  the 
statement  that  the  mapping  is  a  homomorphism  of  rings.  Evaluation  is  a  special 
case  of  a  more  sweeping  property  of  polynomials  given  in  the  next  proposition 
as  a  universal  mapping  property  of  A  [A]. 


Proposition  4.24.  Let  R  be  a  nonzero  commutative  ring  with  identity,  and 
let  i  :  R  — »■  A'[  A  |  be  the  identification  of  R  with  constant  polynomials.  If  T  is 
any  commutative  ring  with  identity,  if  cp  :  R  — »■  T  is  a  homomorphism  of  rings 
sending  1  into  1,  and  if  t  is  in  T ,  then  there  exists  a  unique  homomorphism  of 
rings  <t>  :  R\X\  — »■  T  carrying  identity  to  identity  such  that  $>(((/'))  =  cp(r)  for 
all  r  e  R  and  <J>  (A)  =  t. 
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Remarks.  The  mapping  is  called  the  substitution  homomorphism  ex¬ 
tending  <p  and  substituting  t  for  X ,  and  the  mapping  is  written  P(X)  P'p  (t). 
The  notation  means  that  <p  is  to  be  applied  to  the  coefficients  of  P  and  then  X  is 
to  be  replaced  by  t.  A  diagram  of  this  homomorphism  as  a  universal  mapping 
property  appears  in  Figure  4.7.  In  the  special  case  that  T  =  R  and  <p  is  the 
identity,  <t>  reduces  to  evaluation  at  t,  and  the  mapping  is  written  P(X)  i~>  P(t), 
just  as  in  Section  1.3. 

R  — ^  T 

71 

.  * 

R[X] 

Figure  4.7.  Substitution  homomorphism  for  polynomials  in  one  indeterminate. 


Proof.  Define  4>(ao,  «i,  ■  •  • ,  an,  0, . . . )  =  cp(ao )  +  <p{a\)t  +  •  •  •  +  <p(an)tn. 
It  is  immediate  that  is  a  homomorphism  of  rings  sending  the  identity  t(l)  = 
(1, 0,  0, . . . )  of  R[A]  to  the  identity  tp(  1)  of  T.  If  r  is  in  R,  then  <t> (r (r))  = 
<b(r,  0,  0, . . . )  =  tp{r).  Also,  <b(X)  =  <5> (0,  1,0,0, . . . )  =  t.  This  proves 
existence.  Uniqueness  follows  since  i(R)  and  X  generate  R\X\  and  since  a 
homomorphism  defined  on  R\X\  is  therefore  determined  by  its  values  on  i(R) 
and  X.  □ 

The  formulation  of  the  proposition  with  the  general  cp  :  R  — »■  T,  rather  than  just 
the  identity  mapping  on  R ,  allows  several  kinds  of  applications  besides  the  routine 
evaluation  mapping.  An  example  of  one  kind  occurs  when  R  =  C,  T  =  C[X], 
and  cp  :  C  — »■  C|  A]  is  the  composition  of  complex  conjugation  on  C  followed 
by  the  identification  of  complex  numbers  with  constant  polynomials  in  C[A] ;  the 
proposition  then  says  that  complex  conjugation  of  the  coefficients  of  a  member 
of  C[X]  is  a  ring  homomorphism.  This  observation  simplifies  the  solution  of 
Problem  7  in  Chapter  I.  Similarly  one  can  set  up  matters  so  that  the  proposition 
shows  the  passage  from  7L\X\  to  (7Lj  m7L)\X\  by  reduction  of  coefficients  modulo 
m  to  be  a  ring  homomorphism. 

Still  a  third  kind  of  application  is  to  take  T  in  the  proposition  to  be  a  ring  with  the 
same  kind  of  universal  mapping  property  that  R  [A]  has,  and  the  consequence  is  an 
abstract  characterization  of  R[A].  We  carry  out  the  details  below  as  Proposition 
4.25.  This  result  will  be  applied  later  in  this  section  to  the  several-indeterminate 
case  to  show  that  introducing  several  indeterminates  at  once  yields  the  same  ring, 
up  to  canonical  isomorphism,  as  introducing  them  one  at  a  time. 

Proposition  4.25.  Let  R  and  S  be  nonzero  commutative  rings  with  identity, 
let  X'  be  an  element  of  S ,  and  suppose  that  i!  :  R  — »■  S  is  a  one-one  ring 
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homomorphism  of  R  into  S  carrying  1  to  1.  Suppose  further  that  (S',  i',  X') 
has  the  following  property:  whenever  T  is  a  commutative  ring  with  identity, 
cp  :  R  — >  T  is  a  homomorphism  of  rings  sending  1  into  1 ,  and  t  is  in  T,  then  there 
exists  a  unique  homomorphism  O'  :  S  — >■  T  carrying  identity  to  identity  such 
that  O' (('(/■))  =  (p(r)  for  all  r  e  R  and  <t>'(  X')  =  t.  Then  there  exists  a  unique 
homomorphism  of  rings  O  :  R[X]  — »■  S  such  that  O  o  i  =  i'  and  O(X)  =  X', 
and  O  is  an  isomorphism. 

Remark.  A  somewhat  weaker  conclusion  than  in  the  proposition  is  that  any 
triple  (S,  i' ,  X')  having  the  same  universal  mapping  property  as  (R[X],  i.  X)  is 
isomorphic  to  ( S,i' ,  Xr),  the  isomorphism  being  unique. 

PROOF.  In  the  universal  mapping  property  for  S,  take  T  =  R[X],  <p  =  i,  and 
1  =  X.  The  hypothesis  gives  us  a  ring  homomorphism  d>'  :  S  — >■  R\X\  with 
<F'(1)  =  1,  $'  o  f  =  i,  and  O'CX')  =  X.  Next  apply  Proposition  4.24  with 
T  =  S,  <p  =  if,  and  t  =  X' .  We  obtain  a  ring  homomorphism  <E>  :  R\X\  — »■  S 
with  0(1)  =  I .  T  o ;  =  r',  and  (X)  =  X' .  Then  4>'o  O  is  a  ring  homomorphism 
from  R[X]  to  itself  carrying  1  to  1,  fixing  X,  and  having  4>'  °  =  t.  From 

the  uniqueness  in  Proposition  4.24  when  T  =  R[X],  <p  =  t,  and  t  =  X,  we  see 
that  O'oOis  the  identity  on  R[X].  Reversing  the  roles  of  <t>  and  O'  and  applying 
the  uniqueness  in  the  universal  mapping  property  for  S,  we  see  that  O  o  O'  is  the 
identity  on  S.  Therefore  O  may  be  taken  as  the  isomorphism  O  in  the  statement 
of  the  proposition.  This  proves  existence  for  O.  and  uniqueness  follows  since 
i(R)  and  X  together  generate  R[X]  and  since  O  is  a  homomorphism.  □ 

If  P  is  a  polynomial  over  R  in  one  indeterminate  and  r  is  in  R,  then  r  is  a 
root  of  P  if  P(r)  =  0.  We  know  as  a  consequence  of  Corollary  1.14  that  for 
any  prime  p,  any  polynomial  in  F;,  [X]  of  degree  n  >  1  has  at  most  n  roots.  This 
result  does  not  extend  to  Z/mZ  for  all  positive  integers  nr.  when  in  =  8,  the 
polynomial  X2  —  1  has  4  roots,  namely  1,  3,  5,  7.  This  result  about  Fp[X]  has 
the  following  consequence. 

Proposition  4.26,  If  F  is  a  field,  then  any  finite  subgroup  of  the  multiplicative 
group  Fx  is  cyclic. 

PROOF.  Let  C  be  a  subgroup  of  Fx  of  finite  order  n.  Lagrange’s  Theorem 
(Corollary  4.8)  shows  that  the  order  of  each  element  of  C  divides  n.  With  h 
defined  as  the  maximum  order  of  an  element  of  C,  it  is  enough  to  show  that 
h  =  n.  Let  a  be  an  element  of  order  h.  The  polynomial  X1'  —  1  has  at  most  h 
roots  by  Corollary  1.14,  and  a  is  one  of  them,  by  definition  of  “order.”  If  h  <  n, 
then  it  follows  that  some  member  b  of  C  is  not  a  root  of  Xh  —  1.  The  order  //' 
of  b  is  then  a  divisor  of  n  but  cannot  be  a  divisor  of  h  since  otherwise  we  would 
have  bh  =  (bh  )h/h  =  I /!  =  1.  Consequently  there  exists  a  prime  p  such  that 


5.  Polynomials  and  Vector  Spaces 


153 


some  power  pr  of  p  divides  h'  but  not  h.  Let  .v  <  r  be  the  exact  power  of  p 
dividing  h,  and  write  h  =  mps,  so  that  GCD(m,  pr)  =  1  and  a  =  ap  has  order 
m.  Put  q  =  h' / pr ,  so  that  b'  =  bq  has  order  //.  The  proof  will  be  completed 
by  showing  that  c  =  a'b'  has  order  m //  =  hpr~s  >  h,  in  contradiction  to  the 
maximality  of  h . 

Let  t  be  the  order  of  c.  On  the  one  hand,  from  cmp'  =  (a')",p'  (b' )mp'  = 
ahpr+sb>nprq  =  ahPr+sbmW  =  (ah)pr+s (bh  )m  =  1,  we  see  that  t  divides  mpr .  On 
the  other  hand,  1  =  cr  says  that  (a  )'  =  (b')~‘ .  Raising  both  sides  to  the  // 
power  gives  1  =  (( b')p)~ '  =  (a')tpr ,  and  hence  in  divides  tpr ;  by  Corollary 
1.3,  m  divides  t.  Raising  both  sides  of  (a')'  =  {b')~‘  to  the  mth  power  gives 
1  =  (( a')mY  =  (b')~tm,  and  hence  p1  divides  tm;  by  Corollary  1.3,  p‘  divides 
t.  Applying  Corollary  1.4,  we  conclude  that  mpr  divides  t.  Therefore  t  =  mpr, 
and  the  proof  is  complete.  □ 

Corollary  4.27.  The  multiplicative  group  of  a  finite  field  is  cyclic. 

Proof.  This  is  a  special  case  of  Proposition  4.26.  □ 

A  finite  field  F  can  have  a  nonzero  polynomial  that  is  0  at  every  element  of  F. 
Indeed,  every  element  of  F;)  is  a  root  of  Xp  —  X,  as  a  consequence  of  Fermat’s 
Little  Theorem.  It  is  for  this  reason  that  it  is  unwise  to  confuse  a  polynomial  in 
an  indeterminate  with  a  “polynomial  function.” 

Let  us  make  the  notion  of  a  polynomial  function  of  one  variable  rigorous.  If 
P  (X)  is  a  polynomial  with  coefficients  in  the  commutative  ring  R  with  identity, 
then  Proposition  4.24  gives  us  an  evaluation  homomorphism  P  P(r)  for  each 
r  in  R.  The  function  r  i->  P  ( r )  from  R  into  R  is  the  polynomial  function 
associated  to  the  polynomial  P.  This  function  is  a  member  of  the  commutative 
ring  of  all  R- valued  functions  on  R,  and  the  mapping  P  m>-  (r  P(r))  is 
a  homomorphism  of  rings.  What  we  know  from  Corollary  1.14  is  that  this 
homomorphism  is  one-one  if  R  is  an  infinite  field.  A  negative  result  is  that 
if  R  is  a  finite  commutative  ring  with  identity,  then  \\r(R  (X  —  r )  is  a  polynomial 
that  maps  to  the  0  function,  and  hence  the  homomorphism  is  not  one-one.  A  more 
general  positive  result  than  the  one  above  for  infinite  fields  is  the  following. 

Proposition  4.28. 

(a)  If  R  is  a  nonzero  commutative  ring  with  identity  and  P(  X)  is  a  member  of 
R[X]  with  a  root  r,  then  P{X)  =  (X  -r)Q(X)  for  some  Q(X)  in  R[X]. 

(b)  If  R  is  an  integral  domain,  then  a  nonzero  member  of  A*  [  A"  |  of  degree  n 
has  at  most  n  roots. 

(c)  If  R  is  an  infinite  integral  domain,  then  the  ring  homomorphism  of  R\X\ 
to  the  ring  of  polynomial  functions  from  R  to  R,  given  by  evaluation,  is  one-one. 
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PROOF.  For  (a),  we  proceed  by  induction  on  the  degree  of  P,  the  base  case  of 
the  induction  being  degree  <  0.  If  the  conclusion  has  been  proved  for  degree  <  n 
with  n  >  I ,  let  the  leading  term  of  P  be  a„ X" .  Then  P(X)  =  an(X  —  r)n  +  A{X) 
with  deg  A  <  n.  Evaluation  atr  gives,  by  virtue  of  Proposition  4.24, 0  =  0+A(r). 
By  the  inductive  hypothesis,  A(X)  =  (X-r)B(X).  ThenP(Z)  =  (X-r)Q(X) 
with  Q(X)  =  a„(X  —  r)”_1  +  B(X ),  and  the  induction  is  complete. 

For  (b),  let  P(X)  have  degree  n  with  at  least  n  +  1  distinct  roots  r\, . . . ,  rn+\. 
Part  (a)  shows  that  P(X)  =  (X  —  r\)P\  (X )  with  deg  P\  =  n  —  1.  Also,  0  = 
P(r 2)  =  (r2  —  >'  1 )  P\  (r2).  Since  to  —  r\  7^  0  and  since  R  has  no  zero  divisors, 
P\  (r2)  =  0.  Part  (a)  then  shows  that  P\  (X)  =  (X  —  r2)P2X)>  and  substitution 
gives  P(X)  =  (X  —  r\){X  —  r2 )  P2  X ) .  Continuing  in  this  way,  we  obtain 
P(X)  =  (X  -  n)  ■  ■  ■  (X  -  rn)Pn{X)  with  deg  Pn  =  0.  Since  />/  0,  Pn  /  0. 
So  Pn  is  a  nonzero  constant  polynomial  Pn{X)  =  c/  0.  Evaluating  at  rn+  \ ,  we 
obtain  0  =  (rn+\  —r  1)  •  •  •  (rn+\  —  rn)c  with  each  factor  nonzero,  in  contradiction 
to  the  fact  that  R  is  an  integral  domain. 

For  (c) ,  a  polynomial  in  the  kernel  of  the  ring  homomorphism  has  every  member 
of  R  as  a  root.  If  R  is  infinite,  (b)  shows  that  such  a  polynomial  is  necessarily 
the  zero  polynomial.  Thus  the  kernel  is  0,  and  the  ring  homomorphism  has  to  be 
one-one.  □ 

Let  us  turn  our  attention  to  polynomials  in  several  indeterminates.  Fix  the 
nonzero  commutative  ring  R  with  identity,  and  let  n  be  a  positive  integer.  Infor¬ 
mally  a  polynomial  over  R  in  n  indeterminates  is  to  be  a  finite  sum 

E  n, . ■■■*!? 

h>o,...j„>o 

with  each  ^  in  R.  To  make  matters  precise,  we  work  just  with  the  system  of 
coefficients,  just  as  in  the  case  of  one  indeterminate. 

Let  J  be  the  set  of  integers  >  0,  and  let  J"  be  the  set  of  n-tuplcs  of  elements  of 
j .  A  member  of  J'1  may  be  written  as  /  =  (  /j ,  . . . ,  Addition  of  members  of 
Jn  is  defined  coordinate  by  coordinate.  Thus  j  +  /  =  (jj  +  j jn  +  j^)  if 
j  =  (  j\ , . . . ,  jn)  and  f  =  (  j\ , . . . ,  j'n ).  A  polynomial  in  n  indeterminates  with 
coefficients  in  R  is  a  function  /:/"—>  R  such  that  /(/)  7^  0  for  only  finitely 
many  j  e  J 11 .  Temporarily  let  us  write  S  for  the  set  of  all  such  polynomials  for  a 
particular  n.  If  /  and  g  are  two  such  polynomials,  their  sum  li  and  product  k  are 
the  polynomials  defined  by 


Hj)  =  f(j)  +  g(j ), 
k(i)  =  E  fU)g(j')- 

j+j’-i 


Under  these  definitions,  S  is  a  commutative  ring. 


5.  Polynomials  and  Vector  Spaces 


155 


Define  a  mapping  i  :  R  — »■  5  by 

Jr  if  ;'  =  (0, . . . ,  0), 

i(r)(j)  =  \ 

\  0  otherwise. 

Then  i  is  a  one-one  homomorphism  of  rings,  i  (0)  is  the  zero  element  of  S  and  is 
called  simply  0,  and  r  ( 1 )  is  a  multiplicative  identity  for  S.  The  polynomials  in 
the  image  of  t  are  called  the  constant  polynomials. 

For  1  <  k  <  n,  let  eh  be  the  member  of  Jn  that  is  1  in  the  kth  place  and  is  0 
elsewhere.  Define  X \  to  be  the  polynomial  that  assigns  1  to  cy  and  assigns  0  to  all 
other  members  of  Jn .  We  say  that  X *  is  an  indeterminate.  If  j  =  (j u  ,  jn) 
is  in  Jn ,  define  X  J  to  be  the  product 

X'  =  x{'  ■■■XJ;. 

If  r  is  in  R,  we  allow  ourselves  to  abbreviate  i{r  )XJ  as  r  XJ .  and  any  such  polyno¬ 
mial  is  called  a  monomial.  The  monomial  rX  ’  is  the  polynomial  that  assigns  r  to 
j  and  assigns  0  to  all  other  members  of  J" .  Then  it  follows  immediately  from  the 
definitions  that  each  polynomial  has  a  unique  expansion  as  a  finite  sum  of  nonzero 
monomials.  Thus  the  most  general  member  of  S  is  of  the  form  ^2j€j„  rjX 7  with 
only  finitely  many  nonzero  terms.  This  is  called  the  monomial  expansion  of  the 
given  polynomial. 

We  may  now  write  . . . ,  Xn  ]  for  S.  A  polynomial  YLjc_j><  ri^J  may 

be  conveniently  abbreviated  as  P  or  as  P(X)  or  as  P(X  \, ...  ,Xn)  when  its 
monomial  expansion  is  either  understood  or  irrelevant. 

The  degree  of  the  0  polynomial  is  defined  for  this  section  to  be  —  oo,  and  the 
degree  of  any  monomial  rX  '  with  r  /  0  is  defined  to  be  the  integer 

ly'l  =  7l  H - b  jn  if  j  =  0‘l.  •  •  •  .  jn)- 

Finally  the  degree  of  any  nonzero  polynomial  P,  denoted  by  deg  P,  is  defined  to 
be  the  maximum  of  the  degrees  of  the  terms  in  its  monomial  expansion.  If  all  the 
nonzero  monomials  in  the  monomial  expansion  of  a  polynomial  P  have  the  same 
degree  d,  then  P  is  said  to  be  homogeneous  of  degree  d.  Under  these  definitions 
the  0  polynomial  has  degree  — oo  but  is  homogeneous  of  every  degree.  If  P  and 
Q  are  homogeneous  polynomials  of  degrees  d  and  d’ ,  then  PQ  is  homogeneous 
of  degree  ddr  (and  possibly  equal  to  the  0  polynomial). 

In  any  event,  by  grouping  terms  in  the  monomial  expansion  of  a  polynomial 
according  to  their  degree,  we  see  that  every  polynomial  is  uniquely  the  sum 
of  nonzero  homogeneous  polynomials  of  distinct  degrees.  Let  us  call  this  the 
homogeneous-polynomial  expansion  of  the  given  polynomial.  Let  us  expand 
two  such  nonzero  polynomials  P  and  Q  in  this  fashion,  writing  P  =  /(/,  +•  •  ■+P,tk 
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and  Q  =  +  •  •  •  +  Q#  with  d\  <■  ■  ■  <  dk  and  d\  <  •  •  •  <  d\.  Then  we  see 

directly  that 

deg(P  +  Q)  <  max  (deg  P ,  deg  Q), 
deg(Pg)  <  deg  P  +  deg  Q. 

In  the  formula  for  deg(  P  +  Q),  the  term  that  is  potentially  of  largest  degree  is 
Pdk  +  Qd'r  and  it  is  of  degree  max  (deg  P,  deg  Q)  if  deg  P  7^  deg  Q.  In  the 
formula  for  deg(P<2),  the  term  that  is  potentially  of  largest  degree  is  P^k  Qd'r  It 
is  homogeneous  of  degree  dk  +  d[,  but  it  could  be  0.  Some  proof  is  required  that 
it  is  not  0  if  R  is  an  integral  domain,  as  follows. 

Proposition  4.29.  If  R  is  an  integral  domain,  then  R  [X 1 ,  . . . ,  Xn ]  is  an  integral 
domain. 

PROOF.  Let  P  and  Q  be  nonzero  homogeneous  polynomials  with  deg  P  =  d 
and  deg  Q  =  d' .  We  are  to  prove  that  PQ  ^  0.  We  introduce  an  ordering  on  the 

set  of  all  members  j  of  Jn,  saying  j  =  (./, , . . . ,  j„ )  >  /  =  (j[ - -  j'n)  if  there 

is  some  k  such  that  /,  =  j!  for  i  <  k  and  jk  >  j'k .  In  the  monomial  expansion 
of  P  as  P(X)  =  .|=(#  djXJ ,  let  i  be  the  largest  n-tuple  j  in  the  ordering  such 

that  cij  i=-  0.  Similarly  with  Q(X)  =  ^  _(/,  bkXJ  ,  let  i'  be  the  largest  n-tuple 

/'  in  the  ordering  such  that  by  7^  0.  Then 

P(X)Q(X)  =  aibi’Xi+i'  +  J2  (,jhrxi ' '  * 

j,jr  with 

UJ'WiJ') 


and  all  terms  in  the  sum  ;V  on  the  right  side  have  j  +  /'  <  i  +  i'.  Thus 
aibj'X'+l'  is  the  only  term  in  the  monomial  expansion  of  P (X) Q( X)  involving 
the  monomial  X'+l  .  Since  R  is  an  integral  domain  and  a,  and  bp  are  nonzero, 
aibp  is  nonzero.  Thus  P(X)Q{X )  is  nonzero.  □ 

Proposition  4.30.  Let  R  be  a  nonzero  commutative  ring  with  identity,  let 
7?[Xi,  . . . ,  X„]  be  the  ring  of  polynomials  in  n  indeterminates,  and  define 
i  :  R  —>  R[X\ , . . . ,  X„]  to  be  the  identification  of  R  with  constant  polynomials. 
If  T  is  any  commutative  ring  with  identity,  if  <p  :  R  — >  T  is  a  homomorphism 
of  rings  sending  1  into  1,  and  if  t%v...,tn  are  in  T ,  then  there  exists  a  unique 
homomorphism  <f>  :  R\X  \ , . . .  .X  n  \  ->  T  carrying  identity  to  identity  such  that 
< t> (t (r ))  =  <p(r)  for  all  r  €  R  and  4>(X;)  =  tj  for  1  <  j  <  n. 

Remarks.  The  mapping  4>  is  called  the  substitution  homomorphism  ex¬ 
tending  <p  and  substituting  tj  for  Xj  for  1  <  j  <  n,  and  the  mapping  is  written 
P{X\, ...  ,Xn)  m*-  P^iti, ...  ,t„).  The  notation  means  that  <p  is  to  be  applied 
to  each  coefficient  of  P  and  then  X\, . . . ,  Xn  are  to  be  replaced  by  t  \ .  . ...  t„. 
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A  diagram  of  this  homomorphism  as  a  universal  mapping  property  appears 
in  Figure  4.8.  In  the  special  case  that  T  =  R  x  •  •  •  x  R  (cf.  Example  3  of 
homomorphisms  in  Section  4)  and  tp  is  the  identity,  O  reduces  to  evaluation  at 
(fi, . . . ,  tn),  and  the  mapping  is  written  P(X i, . . . ,  X„)  P(t\, . . . ,  tn). 

R  T 

n 

R[X  1 . xn] 

FIGURE  4.8.  Substitution  homomorphism  for  polynomials  in  n  indeterminates. 

PROOF.  If  P(Xi, . . . ,  Xn)  =  y.  ,, . ah .  jnX j1  •  •  •  X /,"  is  the  monomial 

expansion  of  a  member  P  of  R[Xi , . . . ,  X„],  then  O  (P)  is  dehned  to  be  the  cor¬ 
responding  finite  sum  JT>0  ■  >0  •  •  •  t„" .  Existence  readily  follows, 

and  uniqueness  follows  since  i(R)  and  X\, . . . ,  Xn  generate  R[X |., . . . ,  Xn]  and 
since  O  is  a  homomorphism.  □ 

Corollary  4.31.  If  R  is  a  nonzero  commutative  ring  with  identity,  then 
. . . ,  X„_i][X„]  is  isomorphic  as  a  ring  to  R[Xi . Z„], 

Remark.  The  proof  will  show  that  the  isomorphism  is  the  expected  one. 

PROOF.  In  the  notation  with  //-tuples  and  J",  any  (//  —  l)-tuple  may  be  iden¬ 
tified  with  an  //-tuple  by  adjoining  0  as  its  zzth  coordinate,  and  in  this  way,  every 
monomial  in  R  [ X\ , . . . ,  Xn  _  i  ]  can  be  regarded  as  a  monomial  in  R  [ Xi , . . . ,  Xn  ] . 
The  extension  of  this  mapping  to  sums  gives  us  a  one-one  homomorphism  of  rings 
i'  :  R[X i, . . . ,  Xn_\]  ->  R[X i, . . . ,  Xn ].  We  are  going  to  use  Proposition  4.25 
to  prove  the  isomorphism  of  rings  R[Xi, . . . ,  X„_i][A„]  =  R[X\, . . . ,  X„\.  In 
the  notation  of  that  proposition,  the  role  of  R  is  played  by  /?[Xi, . . . ,  X„_i], 
we  take  S  =  /?[Xi, . . . ,  X„],  and  we  have  constructed  We  are  to  show  that 
(S,i\  X„)  satisfies  a  certain  universal  mapping  property.  Thus  suppose  that  T  is  a 
commutative  ring  with  identity,  that  t  is  in  T ,  and  that  <p'  :  R  [X  i , . . . ,  X„  _  i  ]  — T 
is  a  homomorphism  of  rings  carrying  identity  to  identity. 

We  shall  apply  Proposition  4.30  in  order  to  obtain  the  desired  homomorphism 
<F'  :  S  — >■  T.  Let  t„_j  :  R  ->  R[Xi, . . . ,  X„_i]  be  the  identification  of  R 
with  constant  polynomials  in  R[Xi, . . . ,  X„_i],  and  let  in  =  i  o  in_\  be  the 
identification  of  R  with  constant  polynomials  in  S.  Define  <p  :  R  —*■  T  by 
<p  =  | ,  and  take  tn  =  t  and  /;  =  <p'(X  f)  for  I  <  j  <  n  —  1.  Then  Proposition 

4.30  produces  a  homomorphism  of  rings  O'  :  S  — >■  T  with  0'(/„(r))  =  <p(r)  for 
r  e  R,  0'(V(X7))  =  <p’(Xj )  for  1  <  j  <  n  —  1,  and  0'(X„)  =  tn.  The  equations 

<!>'(/'(/„-!  (r)))  =  0'(t„(r))  =  <p(r)  = 

&(i'(.Xj))  =  <p'(Xj ) 


and 
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show  that  O'o('  =  <p'  on  P[A], . . . ,  Xn].  Also,  0'(X„)  =  tn  =  t.  Thus  the 
mapping  d>'  sought  by  Proposition  4.25  exists.  It  is  unique  since  R\X\, . . . ,  X„_i] 
and  Xn  together  generate  S.  The  conclusion  from  Proposition  4.25  is  that  S  is 
isomorphic  to  ,  X„_i][X„]  via  the  expected  isomorphism  of  rings.  □ 

We  conclude  the  discussion  of  polynomials  in  several  variables  by  making  the 
notion  of  a  polynomial  function  of  several  variables  rigorous.  If  P(X\, . . . ,  Xn) 
is  a  polynomial  in  n  indeterminates  with  coefficients  in  the  commutative  ring 
R  with  identity,  then  Proposition  4.30  gives  us  an  evaluation  homomorphism 
P  P(r\ .....  rn)  for  each  n-tuple  (ri, r„)  of  members  of  R.  The  function 
(r i, . . . ,  r„)  P(r i, . . . ,  rn)  from  R  x  •  •  •  x  R  into  R  is  the  polynomial 
function  associated  to  the  polynomial  P.  This  function  is  a  member  of  the 
commutative  ring  of  all  R- valued  functions  on  R  x  •  •  •  x  R,  and  the  mapping 
f(n, . . . ,  r„)  i->-  P(ri, . . . ,  rn))  is  a  homomorphism  of  rings. 

Corollary  4.32.  If  R  is  an  infinite  integral  domain,  then  the  ring  homomor¬ 
phism  oftf[Xi,  ...,X„]to  polynomial  functions  from  R  x  •  •  •  x  R  to  R,  given 
by  evaluation,  is  one-one. 

Remark.  This  result  extends  Proposition  4.28  to  several  indeterminates. 

PROOF.  We  proceed  by  induction  on  n,  the  case  n  =  1  being  handled  by 
Proposition  4.28.  Assume  the  result  for  n  —  1  indeterminates.  If  P  /  0  is  in 
R[Xi ,  ....  Xn\,  Corollary  4.31  allows  us  to  write 

k 

P(XU  ....  Xn)  =  J2  pi(x i.  •  •  ■ , 

(=i 

for  some  k,  with  each  P,  in  R[Xi, . . . ,  X„_i]  and  with  Pk(X i, . . . ,  X„_i)  ^  0. 
By  the  inductive  hypothesis,  Pk(ri,  ■  ■  ■ ,  rn-i)  is  nonzero  for  some  elements 
r\, . . . ,  r„_ i  of  R.  So  the  polynomial  PiO'i, ....  r„_ \)Xln  in  P[  A„ |  is  not 

the  0  polynomial,  and  Proposition  4.28  shows  that  it  is  not  0  when  evaluated  at 
some  rn.  Then  P(r i, . . . ,  rn)  /  0.  □ 

It  is  possible  also  to  introduce  polynomial  rings  in  infinitely  many  variables. 
These  will  play  roles  only  as  counterexamples  in  this  book,  and  thus  we  shall  not 
stop  to  treat  them  in  detail. 

We  complete  this  section  with  some  remarks  about  vector  spaces.  The  defini¬ 
tion  of  a  vector  space  over  a  general  field  F  remains  the  same  as  in  Section  II.  1 , 
where  F  is  assumed  to  be  Q  or  R  or  C.  We  shall  make  great  use  of  the  fact  that  all 
the  results  in  Chapter  II  concerning  vector  spaces  remain  valid  when  Q  or  M  or 
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C  is  replaced  by  a  general  field  IF.  The  proofs  need  no  adjustments,  and  it  is  not 
necessary  to  write  out  the  details.  For  the  moment  we  make  only  the  following 
application  of  vector  spaces  over  general  fields,  but  the  extended  theory  of  vector 
spaces  will  play  an  important  role  in  most  of  the  remaining  chapters  of  this  book. 

Proposition  4.33.  If  IF  is  a  finite  field,  then  the  number  of  elements  in  F  is  a 
power  of  a  prime. 

Remark.  We  return  to  this  matter  in  Chapter  IX,  showing  at  that  time  that  for 
each  prime  power  p "  >  1,  there  is  one  and  only  one  field  with  p"  elements,  up 
to  isomorphism. 

PROOF.  The  characteristic  of  F  cannot  be  0  since  F  is  finite,  and  hence  it  is  some 
prime  p.  Denote  the  prime  field  of  F  by  Fp.  By  restricting  the  multiplication 
so  that  it  is  defined  only  on  Fp  x  F,  we  make  F  into  a  vector  space  over  Fp, 
necessarily  finite-dimensional.  Proposition  2.18  shows  that  F  is  isomorphic  as  a 
vector  space  to  the  space  (Fp)"  of  //-dimensional  column  vectors  for  some  //,  and 
hence  F  must  have  pn  elements.  □ 


6.  Group  Actions  and  Examples 

Let  X  be  a  nonempty  set,  let  J-(X)  be  the  group  of  invertible  functions  from  X 
onto  itself,  the  group  operation  being  composition,  and  let  G  be  a  group.  A  group 
action  of  G  on  X  is  a  homomorphism  of  G  into  J-(X).  When  X  =  { I , 
the  group  J-(X)  is  just  the  symmetric  group  Thus  Examples  5-9  of  groups 
in  Section  1  are  all  in  fact  subgroups  of  various  groups  J~(  X)  and  are  therefore 
examples  of  group  actions.  Thus  every  group  of  permutations  of  {1, . . . ,  n},  every 
dihedral  group  acting  on  R2,  and  every  general  linear  group  or  subgroup  acting 
on  a  finite-dimensional  vector  space  over  Q  or  1  or  C  or  an  arbitrary  field  F 
provides  an  example.  So  do  the  orthogonal  and  unitary  groups  acting  on  R"  and 
C”,  as  well  as  the  automorphism  group  of  any  number  field. 

We  saw  an  indication  in  Section  1  that  many  early  examples  of  groups  arose  in 
this  way.  One  source  of  examples  that  is  of  some  importance  and  was  not  listed  in 
Section  1  occurs  in  the  geometry  of  R2.  The  translations  in  R2,  together  with  the 
rotations  about  arbitrary  points  of  R2  and  the  reflections  about  arbitrary  lines  in 
R2,  form  a  group  G  of  rigid  motions  of  the  plane.11  This  group  G  is  a  subgroup 
of  JRR2),  and  thus  G  acts  on  R2.  More  generally,  whenever  a  nonempty  set  X 
has  a  notion  of  distance,  the  set  of  isometries  of  X,  i.e.,  the  distance-preserving 
members  of  J-(X),  forms  a  subgroup  of  J-(X),  and  thus  the  group  of  isometries 
of  X  acts  on  X. 

1 1  One  can  show  that  G  is  the  full  group  of  rigid  motions  of  R2 ,  but  this  fact  will  not  concern  us. 
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At  any  rate  a  group  action  r  of  G  on  X,  being  a  homomorphism  of  G  into 
J-(X),  is  of  the  form  g  xg,  where  xg  is  in  JF(X)  and  xglg2  =  xgIxg2.  There  is 
an  equivalent  way  of  formulating  matters  that  does  not  so  obviously  involve  the 
notion  of  a  homomorphism.  Namely,  we  write  xg{x)  =  gx.  In  this  notation  the 
group  action  becomes  a  function  G  x  X  — >■  X  with  (g,  x)  i->  gx  such  that 

(i)  (gig2)x  =  g i  (g2x)  for  all  gj  and  g2  in  G  and  for  all  x  in  X  (from  the  fact 
that  xglg2  =  xglxg2), 

(ii)  lx  =  x  for  all  x  in  X  (from  the  fact  that  xi  =  1). 

Conversely  if  G  x  X  — >  X  satisfies  (i)  and  (ii),  then  the  formulas  x  =  lx  = 
(gg~1)x  =  g(g_1x)  and  x  =  lx  =  (g~1g)x  =  g~1(gx)  show  that  the  function 
xi gx  from  X  to  itself  is  invertible  with  inverse  x  g~lx.  Consequently 
the  definition  xg{x)  =  gx  makes  g  m>-  r?  a  function  from  G  into  and  (i) 

shows  that  r  is  a  homomorphism.  Thus  (i)  and  (ii)  indeed  give  us  an  equivalent 
formulation  of  the  notion  of  a  group  action.  Both  formulations  are  useful. 

Quite  often  the  homomorphism  G  — >  J~(X)  of  a  group  action  is  one-one,  and 
then  G  can  be  regarded  as  a  subgroup  of  T(X).  Here  is  an  important  geometric 
example  in  which  the  homomorphism  is  not  one-one. 


Example.  Linear  fractional  transformations.  Let  X  =  C  U  {oo},  a  set  that 
becomes  the  Riemann  sphere  in  complex  analysis.  The  group  G  =  GL(2,  C) 
acts  on  X  by  the  linear  fractional  transformations 


a 

c 


az  +  b 
cz  +  d' 


the  understanding  being  that  the  image  of  oo  is  ac~ 1  and  the  image  of  —dc~x 
is  oo,  just  as  if  we  were  to  pass  to  a  limit  in  each  case.  Property  (ii)  of  a  group 
action  is  clear.  To  verify  (i),  we  simply  calculate  that 


a 

c' 


<•'  (jjg)  +  V 

GS5)+''' 

(a' a  +  b'c)z  +  (a'b  +  b'd ) 
(i c'a  +  d'c)z  +  ( c'b  +  d'd) 


(z), 


and  indeed  we  have  a  group  action.  Let  SL(2,  R)  be  the  subgroup  of  real  matrices 
in  GL(2,  C)  of  determinant  1,  and  let  Y  be  the  subset  of  X  where  Imz  >  0,  not 
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including  oo.  The  members  of  SL(2,  K)  carry  the  subset  Y  into  itself,  as  we  see 
from  the  computation 

az  +  b  ( az  +  b)(cz  +  d )  adz  +  bcz 

1m -  =  im - - -  =  1m - — 

cz  +  cl  \cz  +  d\2  \cz  +  d\2 

( ad  —  bc)lmz  Imz 

\cz  +  d\2  \cz  +  d\2 ' 

Since  the  effect  of  a  matrix  g-1  is  to  invert  the  effect  of  g,  and  since  both  g  and 
g-1  carry  Y  to  itself,  we  conclude  that  SL(2,  K)  acts  on  Y  =  (j  e  C  Im  ,">()} 
by  linear  fractional  transformations.  In  similar  fashion  one  can  verify  that  the 
subgroup12  of  GL(2,  C) 


ap  <*eCJeC,  \ot\2  —  \p\2  =  1 

acts  on  {g  e  C  [  \z\  <  1}  by  linear  fractional  transformations. 


One  group  action  can  yield  many  others.  For  example,  from  an  action  of  G  on 
X,  we  can  construct  an  action  on  the  space  of  all  complex-valued  functions  on 
X.  The  definition  is  (gf )  (x )  =  /(g-1x),  the  use  of  the  inverse  being  necessary 
in  order  to  verify  property  (i)  of  a  group  action: 

((glg2)f)(x)  =  f((g\g2)~lX)  =  /((g2"V)jt) 

=  f(g2l(gilx))  =  (g2f)(g;lx)  =  (gl(g2/))0). 

There  is  nothing  special  about  the  complex  numbers  as  range  for  the  functions 
here.  We  can  allow  any  set  as  range,  and  we  can  even  allow  G  to  act  on  the  range, 
as  well  as  on  the  domain.13  If  G  acts  on  X  and  Y,  then  the  set  of  functions  from 
X  to  Y  inherits  a  group  action  under  the  definition 

(, gf)(x )  =  g(f(g~lx)), 

as  is  easily  checked.  In  other  words,  we  are  to  use  g-1  where  the  domain  enters 
the  formula  and  we  are  to  use  g  where  the  range  enters  the  formula. 

If  V  is  a  vector  space  over  a  field  IF,  a  representation  of  G  on  V  is  a  group 
action  of  G  on  V  by  linear  functions.  Specifically  for  each  g  e  G,  i,  is  to  be  a 

12This  subgroup  is  commonly  called  SU(1,  1)  for  reasons  that  are  not  relevant  to  the  current 
discussion. 

13  When  C  was  used  as  range  in  the  previous  display,  the  group  action  of  G  on  C  was  understood 
to  be  trivial  in  the  sense  that  gz  =  z  for  every  g  in  G  and  z  in  C. 
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member  of  the  group  of  linear  maps  from  V  into  itself.  Usually  one  writes  r  (g) 
instead  of  xg  in  representation  theory,  and  thus  the  condition  is  that  r  (g)  is  to  be 
linear  for  each  g  e  G  and  we  are  to  have  r  ( 1 )  =  1  and  r(gig2)  =  T(gi)r(g2)  for 
all  gi  and  g2.  There  are  interesting  examples  both  when  V  is  finite-dimensional 
and  when  V  is  infinite-dimensional.14 


Examples  of  representations. 

(1)  If  m  >  1,  then  the  additive  group  Z /  m  Z  acts  linearly  on  R2  by 


r(k)  = 


/cos^ 

I  m 

l  sin  — 

\  m 


—  sin 


cos 


2nk 


2nk 


k  g  {0,  1,2 1}. 


Each  r  (k)  is  a  rotation  matrix  about  the  origin  through  an  angle  that  is  a  multiple  of 
lir/m.  These  transformations  of  R2  form  a  subgroup  of  the  group  of  symmetries 
of  a  regular  k-gon  centered  at  the  origin  in  R2. 

(2)  The  dihedral  group  acts  linearly  on  R2  with 
«»-(")’  U2  3)  =  (;_;).  r(l  3) 


Each  of  these  matrices  carries  into  itself  the  equilateral  triangle  with  center  at  the 
origin  and  one  vertex  at  (1,0).  To  obtain  these  matrices,  we  number  the  vertices 
#1,  #2,  #3  counterclockwise  with  the  vertex  at  (1,  0)  as  #1. 

(3)  The  symmetric  group  S„  acts  linearly  on  R"  by  permuting  the  indices 
of  standard  basis  vectors.  For  example,  with  n  =  3,  we  have  (1  3)ei  =  <33, 
(1  3)e2  =  <?2,  etc.  The  matrices  may  be  computed  by  the  techniques  of  Section 
II. 3.  With  n  =  3,  we  obtain,  for  example, 


t(1  2  3)  = 


_  1 

2 

V3 


73 

2 


r(l  3  2) 


(1  3)  FA 


0  0  1 
0  1  0 
1  0  0 


and 


(1  2  3)14 


0  0  1 
1  0  0 
0  1  0 


(4)  If  G  acts  on  a  set  X,  then  the  corresponding  action  ( gf)(x )  =  /(g-1x)  on 
complex-valued  functions  is  a  representation  on  the  vector  space  of  all  complex¬ 
valued  functions  on  X.  This  vector  space  is  infinite-dimensional  if  X  is  an  infinite 
set.  The  linearity  of  the  action  on  functions  follows  from  the  definitions  of  addition 

l4In  some  settings  a  continuity  assumption  may  be  added  to  the  definition  of  a  representation,  or 
the  field  F  may  be  restricted  in  some  way.  We  impose  no  such  assumption  here  at  this  time. 
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and  scalar  multiplication  of  functions.  In  fact,  let  functions  f\  and  f2  be  given, 
and  let  c  be  a  scalar.  Then 

(g(fi  +  fi))(x)  =  (/i  +  f2)(g~lx)  =  +  f2(g~lx) 

=  (gfl)(x)  +  (gf2)(x)  =  (gfi  +  gf2)(x) 


and 

(g(cfi))(x)  =  (c/i)(g_1v)  =  c(fi(g~lx))  =  c((gfi)(x))  =  (c(gfi))(x). 

One  more  important  class  of  group  actions  consists  of  those  that  are  closely 
related  to  the  structure  of  the  group  itself.  Two  simple  ones  are  the  action  of  G 
on  itself  by  left  translations  (gi,  g2)  gig2  and  the  action  of  G  on  itself  by 
right  translations  (gi ,  g2)  i->-  g2g^  ■  More  useful  is  the  action  of  G  on  a  quotient 
space  G/H,  where  His  a  subgroup.  This  action  is  given  by  (g\,  g2H)  m*-  g  \  g2  H . 
There  are  still  others,  and  some  of  them  are  particularly  handy  in  analyzing  finite 
groups.  We  give  some  applications  in  the  present  section  and  the  next,  and  we 
postpone  others  to  Section  10.  Before  describing  some  of  these  actions  in  detail, 
let  us  make  some  general  definitions  and  establish  two  easy  results. 

Let  G  x  X  — »■  X  be  a  group  action.  If  p  is  in  X,  then  Gp  =  {g  £  G  \  gp  =  p] 
is  a  subgroup  of  G  called  the  isotropy  subgroup  at  p  or  stabilizer  of  G  at  p. 
This  is  not  always  a  normal  subgroup;  however,  the  subgroup  fj;)sG  Gp  that  fixes 
all  points  of  X  is  the  kernel  of  the  homomorphism  G  — >  J~(  X )  defining  the  group 
action,  and  such  a  kernel  has  to  be  normal. 

Let  p  and  q  be  in  X.  We  say  that  p  is  equivalent  to  q  for  the  purposes  of 
this  paragraph  if  p  =  gq  for  some  g  £  G.  The  result  is  an  equivalence  relation: 
it  is  reflexive  since  p  =  Ip,  it  is  symmetric  since  p  =  gq  implies  g~lp  =  q, 
and  it  is  transitive  since  p  =  gq  and  q  =  g'r  together  imply  p  =  (gg')r.  The 
equivalence  classes  are  called  orbits  of  the  group  action.  The  orbit  of  a  point  p 
in  X  is  Gp  =  {gp  \  g  £  G}.  If  Y  =  Gp  is  an  orbit,15  or  more  generally  if  Y  is 
any  subset  of  X  carried  to  itself  by  every  element  of  G,  then  G  x  Y  — >■  Y  is  a 
group  action.  In  fact,  each  function  y  gy  is  invertible  on  Y  with  y  g~ly 
as  the  inverse  function,  and  properties  (i)  and  (ii)  of  a  group  action  follow  from 
the  same  properties  for  X. 

A  group  action  G  x  X  — >■  X  is  said  to  be  transitive  if  there  is  just  one  orbit, 
hence  if  X  =  Gp  for  each  p  in  X.  It  is  simply  transitive  if  it  is  transitive  and  if 
for  each  p  and  q  in  X,  there  is  just  one  element  g  of  G  with  gp  =  q. 

15Although  the  notation  Gp  for  the  isotropy  subgroup  and  Gp  for  the  orbit  are  quite  distinct  in 
print,  it  is  easy  to  confuse  the  two  in  handwritten  mathematics.  Some  readers  may  therefore  prefer 
a  different  notation  for  one  of  them.  The  notation  Zc(p)  for  the  isotropy  subgroup  is  one  that  is  in 
common  use;  its  use  is  consistent  with  the  notation  for  the  “centralizer”  of  an  element  in  a  group, 
which  will  be  defined  shortly.  Another  possibility,  used  by  many  mathematicians,  is  to  write  G  ■  p 
for  the  orbit. 
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Proposition  4.34.  Let  GxX->  X  be  a  group  action,  let  p  be  in  X,  and  let 
H  be  the  isotropy  subgroup  at  p.  Then  the  map  G  — »■  Gp  given  by  g  h>-  gp 
descends  to  a  well-defined  map  G/H  — >  Gp  that  is  one-one  from  G/H  onto  the 
orbit  Gp  and  respects  the  group  actions. 

Remark.  In  other  words,  a  group  action  of  G  on  a  single  orbit  is  always 
isomorphic  as  a  group  action  to  the  action  of  G  on  some  quotient  space  G/H. 

Proof.  Let  cp  :  G  — >■  Gp  be  defined  by  (pig)  =  gp.  For  h  in  H  =  Gp, 
cpigh)  =  ( gh)p  =  g{hp)  =  gp  =  (pig)  shows  that  tp  descends  to  a  well-defined 
function  i p  :  G/H  —*■  Gp ,  and  <p  is  certainly  onto  Gp.  If  <p(g\H)  =  q>ig2  H ), 
then  gip  =  cp(gip)  =  (p(g2p )  =  giP ,  and  hence  g^giP  =  P,  gT'gi  is  in  H , 
gi  is  in  g2H ,  and  g\H  =  g2H.  Thus  <p  is  one-one. 

Respecting  the  group  action  means  that  Ipigg'H)  =  g/j(g'  H ),  and  this  identity 
holds  since  gtpig'H)  =  gepig')  =  gig' p )  =  igg')p  =  q>{gg')  =  yigg'H).  □ 

A  simple  consequence  is  the  following  important  counting  formula  in  the 
case  of  a  group  action  by  a  finite  group. 

Corollary  4.35.  Let  G  be  a  finite  group,  let  G  x  X  — >■  X  be  a  group  action, 
let  p  be  in  X,  and  Gp  be  the  isotropy  group  at  p,  and  let  Gp  be  the  orbit  of  p. 
Then  |G|  =  |Gp|  \GP\. 

PROOF.  Proposition  4.34  shows  that  the  action  of  G  on  some  G/Gp  is  the  most 
general  group  action  on  a  single  orbit,  Gp  being  the  isotropy  subgroup.  Thus  the 
corollary  follows  from  Lagrange’s  Theorem  (Theorem  4.7)  with  H  =  Gp  and 
G/H  =  Gp.  □ 

We  turn  to  applications  of  group  actions  to  the  structure  of  groups.  If  H  is  a 
subgroup  of  a  group  G,  the  index  of  H  in  G  is  the  number  of  elements  in  G/H. 
finite  or  infinite.  The  first  application  notes  a  situation  in  which  a  subgroup  of  a 
finite  group  is  automatically  normal. 

Proposition  4.36.  Let  G  be  a  finite  group,  and  let  p  be  the  smallest  prime 
dividing  the  order  of  G.  If  H  is  a  subgroup  of  G  of  index  p,  then  H  is  normal. 

Remarks.  The  most  important  case  is  p  =  2:  any  subgroup  of  index  2  is 
automatically  normal,  and  this  conclusion  is  valid  even  if  G  is  infinite,  as  was 
already  pointed  out  in  Example  3  of  Section  2.  If  G  is  finite  and  if  2  divides  the 
order  of  G,  there  need  not,  however,  be  any  subgroup  of  index  2;  for  example, 
the  alternating  group  24  has  order  12,  and  Problem  1 1  at  the  end  of  the  chapter 
shows  that  24  has  no  subgroup  of  order  6. 
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PROOF.  Let  X  =  G/ H,  and  restrict  the  group  action  G  x  X  — »■  X  to  an  action 
H  x  X  — >  X.  The  subset  { 1 H]  is  a  single  orbit  under  H,  and  the  remaining  p  —  1 
members  of  G/H  form  a  union  of  orbits.  Corollary  4.35  shows  that  the  number 
of  elements  in  an  orbit  has  to  be  a  divisor  of  |  H  |,  and  the  smallest  divisor  of  |  H  \ 
other  than  1  is  >  p  since  the  smallest  divisor  of  |  G  |  other  than  1  equals  p  and 
since  |  H  |  divides  |  G  \ .  Hence  any  orbit  of  H  containing  more  than  one  element 
has  at  least  p  elements.  Since  only  p  —  1  elements  are  left  under  consideration, 
each  orbit  under  H  contains  only  one  element.  Therefore  hgH  =  gH  for  all  h 
in  H  and  g  in  G.  Then  g~1hg  is  in  H,  and  we  conclude  that  H  is  normal.  □ 

If  G  is  a  group,  the  center  Zq  of  G  is  the  set  of  all  elements  x  such  that  gx  =  xg 
for  all  g  in  G.  The  center  of  G  is  a  subgroup  (since  gx  =  xg  and  gy  =  yg  together 
imply  g(xy)  =  xgy  =  (xy)g  and  xg*1  =  g~1(gx)g~1  =  g~'(xg)g-'  =  g~lx), 
and  every  subgroup  of  the  center  is  normal  since  x  £  Zq  and  g  £  G  together 
imply  gxg-1  =  x.  Here  are  examples:  the  center  of  a  group  G  is  G  itself  if  and 
only  if  G  is  abelian,  the  center  of  the  quaternion  group  is  {±1},  and  the  center 
of  any  symmetric  group  6„  with  n  >  3  is  { 1 } . 

If  x  is  in  G,  the  centralizer  of  x  in  G,  denoted  by  Zq(x),  is  the  set  of  all  g 
such  that  gx  =  xg.  This  is  a  subgroup  of  G,  and  it  equals  G  itself  if  and  only  if 
x  is  in  the  center  of  G.  For  example  the  centralizer  of  i  in  H$  is  the  4-element 
subgroup  {±1,  ±i}. 

Having  made  these  definitions,  we  introduce  a  new  group  action  of  G  on  G, 
namely  (g,  x)  i->  gxg~l.  The  orbits  are  called  the  conjugacy  classes  of  G.  If  x 
and  y  are  two  elements  of  G,  we  say  that  x  is  conjugate  to  y  if  x  and  y  are  in 
the  same  conjugacy  class.  In  other  words,  x  is  conjugate  to  y  if  there  is  some  g 
in  G  with  gxg-1  =  y.  The  result  is  an  equivalence  relation.  Let  us  write  Ci(x) 
for  the  conjugacy  class  of  x.  We  can  easily  compute  the  isotropy  subgroup  Gx 
at  x  under  this  action;  it  consists  of  all  g  e  G  such  that  gxg~ 1  =  x  and  hence  is 
exactly  the  centralizer  Zq(x)  of  x  in  G.  In  particular,  Cl (x)  =  {x}  if  and  only 
if  x  is  in  the  center  Zq.  Applying  Corollary  4.35,  we  immediately  obtain  the 
following  result. 

Proposition  4.37.  If  G  is  a  finite  group,  then  |G|  =  |  Ct(x)  \  \Zq (x ) [  for  all  x 
in  G. 

Thus  [  Cl  (x)  [  is  always  a  divisor  of  |  G  | ,  and  it  equals  1  if  and  only  if  x  is  in  the 
center  Zq.  Let  us  apply  these  considerations  to  groups  whose  order  is  a  power 
of  a  prime. 

Corollary  4.38.  If  G  is  a  finite  group  whose  order  is  a  positive  power  of  a 
prime,  then  the  center  Zq  is  not  {1}. 
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PROOF.  Let  |G|  =  pn  with  p  prime  and  with  n  >  0.  The  conjugacy  classes  of 
G  exhaust  G,  and  thus  the  sum  of  all  |  Ci(x) |’s  equals  |G[.  Since  |  C((x)\  =  1 
if  and  only  if  x  is  in  Zq,  the  sum  of  Zq  \  and  all  the  |  Ct(x)  \  ’s  that  are  not  1  is 
equal  to  |G|.  All  the  terms  |  C£(x) \  that  are  not  1  are  positive  powers  of  p,  by 
Proposition  4.37,  and  so  is  |G|.  Therefore  p  divides  \Zq\.  □ 

Corollary  4.39.  If  G  is  a  finite  group  of  order  p2  with  p  prime,  then  G  is 
abelian. 

Proof.  From  Corollary  4.38  we  see  that  either  \Zq\  =  p2,  in  which  case  G  is 
abelian,  or  \  Zq\  =  p.  We  show  that  the  latter  is  impossible.  If  fact,  if  x  is  not  in 
Zq,  then  Zq{x)  is  a  subgroup  of  G  that  contains  Zq  and  the  element  x.  It  must 
then  have  order  p2  and  be  all  of  G.  Hence  every  element  of  G  commutes  with  x, 
and  x  is  in  Zq,  contradiction.  □ 

Corollary  4.40.  If  G  is  a  finite  group  whose  order  is  a  positive  power  pn  of 
a  prime  p,  then  there  exist  normal  subgroups  G*  of  G  for  0  <  k  <  n  such  that 
|G*|  =  pk  foralU  1 1  and  such  that  f i i  G  for  all  k  <c  it. 

Proof.  We  proceed  by  induction  on  n.  The  base  case  of  the  induction  is 
n  =  1  and  is  handled  by  Corollary  4.9.  Assume  inductively  that  the  result 
holds  for  n,  and  let  G  have  order  p"+1.  Corollary  4.38  shows  that  Zq  ■=/=■  {1}. 
Any  element  ^  1  in  Zq  must  have  order  a  power  of  p,  and  some  power  of 
it  must  therefore  have  order  p.  Thus  let  a  be  an  element  of  Zq  of  order  p, 
and  let  H  be  the  subgroup  consisting  of  the  powers  of  a.  Then  H  is  normal 
and  has  order  p.  Let  G'  =  G/H  be  the  quotient  group,  and  let  (p  :  G  — >■  G' 
be  the  quotient  homomorphism.  The  group  G'  has  order  //' ,  and  the  inductive 
hypothesis  shows  that  G'  has  normal  subgroups  G'k  for  0  <  k  <  n  such  that 
\G’k\  =  pk  for  k  <  n  and  G'k  c  G'k+]  for  k  <  n  —  1.  For  1  <  k  <  n  +  1,  define 
Gk  =  (p~1(G'k_l),  and  let  Go  =  {1}.  The  First  Isomorphism  Theorem  (Theorem 
4.13)  shows  that  each  G*  for  k  >  1  is  a  normal  subgroup  of  G  containing  H  and 
that  tp(Gk)  =  G'k_ , .  Then  <p|(.  is  a  homomorphism  of  Gi  onto  G[_ ,  with  kernel 
H ,  and  hence  G /. |  =  \G'k_x  \  \H\  =  pk~l p  =  pk.  Therefore  the  Gk  s  will  serve 
as  the  required  subgroups  of  G.  □ 

It  is  not  always  so  easy  to  determine  the  conjugacy  classes  in  a  particular  group. 
For  example,  in  GL(«,  C)  the  question  of  conjugacy  is  the  question  whether 
two  matrices  are  similar  in  the  sense  of  Section  II. 3;  this  will  be  one  of  the 
main  problems  addressed  in  Chapter  V.  By  contrast,  the  problem  of  conjugacy  in 
symmetric  groups  has  a  simple  answer.  Recall  that  every  permutation  is  uniquely 
the  product  of  disjoint  cycles.  The  cycle  structure  of  a  permutation  consists  of 
the  number  of  cycles  of  each  length  in  this  decomposition. 
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Lemma  4.41.  Let  a  and  r  be  members  of  the  symmetric  group  &n.  If  a 
is  expressed  as  the  product  of  disjoint  cycles,  then  tor-1  has  the  same  cycle 
structure  as  cr,  and  the  expression  for  rcrr-1  as  the  product  of  disjoint  cycles  is 
obtained  from  that  for  a  by  substituting  r  ( k )  for  k  throughout. 

Remark.  For  example,  if  a  =  (a  b)(c  d  e),  then  tor-1  decomposes  as 
(r (a)  x(b))(x{c)  r(d)  T(e)). 

PROOF.  Because  the  conjugate  of  a  product  equals  the  product  of  the  conju¬ 
gates,  it  is  enough  to  handle  a  cycle  y  =  (a i  ci2  •  •  •  ar)  appearing  in  a .  The 
corresponding  cycle  y'  =  ryr-1  is  asserted  to  be  y1  =  (r(ai)  rfe)  •••  r{ar)). 
Application  of  r-1  to  r(a;  )  yields  cij,  application  of  a  to  this  yields  a-j+ \  if  j  <  r 
and  ci\  if  /  =  r,  and  application  of  r  to  the  result  yields  r(u/+i)  or  r (a \ ) .  For 
each  of  the  symbols  b  not  in  the  list  {a\, . . . ,  a, •},  ryr~l{r(b))  =  r (b)  since 
y(b )  =  b.  Thus  ryr-1  =  y' ,  as  asserted.  □ 

Proposition  4.42.  Let  H  be  a  subgroup  of  a  symmetric  group  ©„.  If  C£(x) 
denotes  a  conjugacy  class  in  H,  then  all  members  of  C£(x)  have  the  same  cycle 
structure.  Conversely  if  H  =  then  the  conjugacy  class  of  a  permutation  a 
consists  of  all  members  of  &„  having  the  same  cycle  structure  as  a . 

PROOF.  The  first  conclusion  is  immediate  from  Lemma  4.41.  For  the  second 
conclusion,  let  a  and  a'  have  the  same  cycle  structure,  and  let  r  be  the  permutation 
that  moves,  for  each  k,  the  kth  symbol  appearing  in  the  disjoint-cycle  expansion 
of  a  into  the  kth  symbol  in  the  corresponding  expansion  of  a'.  Define  r  on 
the  remaining  symbols  in  any  fashion  at  all.  Application  of  the  lemma  shows 
that  tctt_i  =  a'.  Thus  any  two  permutations  with  the  same  cycle  structure  are 
conjugate.  □ 


7.  Semidirect  Products 

One  more  application  of  group  actions  to  the  structure  theory  of  groups  will 
be  to  the  construction  of  “semidirect  products”  of  groups.  If  H  is  a  group, 
then  an  isomorphism  of  H  with  itself  is  called  an  automorphism.  The  set  of 
automorphisms  of  H  is  a  group  under  composition,  and  we  denote  it  by  Aut  H . 
We  are  going  to  be  interested  in  “group  actions  by  automorphisms,”  i.e.,  group 
actions  of  a  group  G  on  a  space  X  when  X  is  itself  a  group  and  the  action  by  each 
member  of  G  is  an  automorphism  of  the  group  structure  of  X ;  the  group  action 
is  therefore  a  homomorphism  of  the  form  r  :  G  — >■  Aut  X. 

Example  1.  In  K2,  we  can  identify  the  additive  group  of  the  underlying 
vector  space  with  the  group  of  translations  tv(w)  =  v  +  w:  the  identification 
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associates  a  translation  t  with  the  member  £(0)  of  R2.  Let  H  be  the  group  of 
translations.  The  rotations  about  the  origin  in  R2,  namely  the  linear  maps  with 
matrices  (  cos °  sinM,  form  a  group  G  =  SO (2)  that  acts  on  R2,  hence  acts  on 

\  -  sin  0  cos  8  J  D  r 

the  set  H  of  translations.  The  linearity  of  the  rotations  says  that  the  action  of 
G  =  SO  (2)  on  the  translations  is  by  automorphisms  of  //,  i.e.,  that  each  rotation, 
in  its  effect  on  G,  is  in  Aut  H .  Out  of  these  data— the  two  groups  G  and  H  and  a 
homomorphism  of  G  into  Aut  H  —we  will  construct  below  what  amounts  to  the 
group  of  all  rotations  (about  any  point)  and  translations  of  R2.  The  construction 
is  that  of  a  “semidirect  product.” 

Example  2.  Take  any  group  G,  and  let  G  act  on  X  =  G  by  conjugation.  Each 
conjugation  x  m-  gxg-1  is  an  automorphism  of  G,  and  thus  the  action  of  G  on 
itself  by  conjugation  is  an  action  by  automorphisms. 

Let  G  and  H  be  groups.  Suppose  that  a  group  action  r  :  G  — »■  J~(  H )  is  given 
with  G  acting  on  H  by  automorphisms.  That  is,  suppose  that  each  map  h  m>-  r „  ( h ) 
is  an  automorphism  of  H .  We  define  a  group  GxTH  whose  underlying  set  will  be 
the  Cartesian  product  G  x  H.  The  motivation  for  the  definition  of  multiplication 
comes  from  Example  2,  in  which  r?(7z)  =  ghg~x .  We  want  to  write  a  product 
g\h  ig2h2  in  the  form  g'h',  and  we  can  do  so  using  the  formula 

gih\g2h2  =  gigi(g2lhig2)h2  =  {g\g2){{rg~i{h\))h2). 

Similarly  the  formula  for  inverses  is  motivated  by  the  formula 

(gh)~l  =  h~xg~x  =  g~l(gh~lg~')  =  g~lrg{h~l). 

Proposition  4.43.  Let  G  and  H  be  groups,  and  let  r  be  a  group  action  of  G  on 
H  by  automorphisms.  Then  the  set-theoretic  product  G  x  H  becomes  a  group 
G  x  T  H  under  the  definitions 

(gL  h1)(g2,  ho)  =  (g\go,  {jg~i{h\))h2) 
and  (g,  h)  1  =  (g_1,  rg(h~')). 

The  mappings  /]  :  G  — G  xT  H  and  i2  :  H  —>■  G  xT  H  given  by  z'i  (g)  =  (g,  1) 
and  i2(h)  =  (1,  h)  are  one-one  homomorphisms,  and  p\  :  G  xT  H  — >■  G  given 
by  pi  (g,  h)  =  g  is  a  homomorphism  onto  G.  The  images  G'  =  i ]  (G)  and  H'  = 
i2  ( H )  are  subgroups  of  GxT  H  with  H'  normal  such  that  G'Pi  H'  =  { 1 },  such  that 
every  element  of  G  x  T  H  is  the  product  of  an  element  of  G'  and  an  element  of  H\ 
and  such  that  conjugation  of  G'  on  H'  is  given  by  i\(g)i2(h)i\(g)~x  =  i2(Tg(h)). 
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Remark.  The  group  G  xT  H  is  called  the  external  semidirect  product16  of 
G  and  H  with  respect  to  r. 

PROOF.  For  associativity  we  compute  directly  that 

{(gi,hi)(g2,  h2))(g3,  h3)  =  (gig2g3,  xg-\  (rg-i  (h\ )h2)h3) 
and  (gi,hi)((g2,  h2)(g3,  h3))  =  {g\g2g3,  xg-\ g-\(h\)xg-*{h2)h3). 

Since 


V(V(/Z|)/Z2)  =  (VV(l!|))Vfc)  =  Vfel(/,l)V(/,2)* 

we  have  a  match.  It  is  immediate  that  (1.  1)  is  a  two-sided  identity.  Since 
(g,h)(g^k,xg(h~1))  =  (1,  xg{h)xg(h~1))  =  (1  ,xg(hh~1))  =  (l,rg(l))  = 
(1,  1)  and  (g~l,  x =  (1,  xg-i  (xg(h~l))h )  =  (1,  xl =  (1,  1), 
(g~  1 .  Tg{h~1))  is  indeed  a  two-sided  inverse  of  (g,  h).  It  is  immediate  from  the 
definition  of  multiplication  that  i\,  i2,  and  p\  are  homomorphisms,  that  i\  and  i2 
are  one-one,  that  p\  is  onto,  that  G'GH'  =  {l},andthatGxr//  =  G' H' .  Since/'i 
and  i2  are  homomorphisms,  G'  and  H'  are  subgroups.  Since  H'  is  the  kernel  of  p\ , 
H'  is  normal.  Finally  the  dehnition  of  multiplication  gives  ii(g)i2(h)ii(g)~1  = 
(g,  h)(g,  1)_1  =  (g,  h)(g~\  1)  =  (1,  (t?(7z))1)  =  i2(xg(h)),  and  the  proof  is 
complete.  □ 

Proposition  4.44.  Let  S'  be  a  group,  let  G  and  H  be  subgroups  with  H  normal, 
and  suppose  that  G  fl  H  =  { 1 }  and  that  every  element  of  S  is  the  product  of  an 
element  of  G  and  an  element  of  H.  For  each  g  e  G,  define  an  automorphism  rg 
of  H  by  Xg(h)  =  ghg~l .  Then  r  is  a  group  action  of  G  on  H  by  automorphisms, 
and  the  mapping  G  xT  H  — >■  S  given  by  {g,h)  i->  gh  is  an  isomorphism  of 
groups. 

Remarks.  In  this  case  we  call  S  an  internal  semidirect  product  of  G  and 
H  with  respect  to  r.  We  shall  not  attempt  to  write  down  a  universal  mapping 
property  that  characterizes  internal  semidirect  products. 

Proof.  Since  xglg2(h)  =  gigihg^g^1  =  gixg2(h)g~l  =  rglrg2(/z)  and  since 
each  Tg  is  an  automorphism  of  H,  x  is  an  action  by  automorphisms.  Proposition 
4.43  therefore  shows  that  G  xT  H  is  a  well-defined  group.  The  function  <p  from 
G  x  T  H  to  S  given  by  tp(g,  h)  =  gh  is  a  homomorphism  by  the  same  computation 
that  motivated  the  definition  of  multiplication  in  a  semidirect  product,  and  <p  is 
onto  S  since  every  element  of  S  lies  in  the  set  GH  of  products.  If  gh  =  1,  then 
g  =  h~x  exhibits  g  as  in  G  fl  H  =  {1}.  Hence  g  =  I  and  h  =  1.  Therefore  <p  is 
one-one  and  must  be  an  isomorphism.  □ 

l6The  notation  x  is  used  by  some  authors  in  place  of  x  z .  The  normal  subgroup  goes  on  the  open 
side  of  the  x  and  on  the  side  of  the  subscript  r  in  x  z . 
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Example  1 .  Dihedral  groups  D„ .  We  show  that  Dn  is  the  internal  semidirect 
product  of  a  2-element  group  and  the  rotation  subgroup.  Let  H  be  the  group 
of  rotations  about  the  origin  through  multiples  of  the  angle  2 n/n.  This  group 
is  cyclic  of  order  n,  and  it  is  normal  in  Dn  because  it  is  of  index  2.  If  5  is  any 
of  the  reflections  in  Dn,  then  G  =  {1,  s}  is  a  subgroup  of  D, ,  of  order  2  with 
G  D  H  =  {1}.  Counting  the  elements,  we  see  that  every  element  of  Dn  is  of  the 
form  rk  or  srk ,  in  other  words  that  the  set  of  products  GH  is  all  of  D„.  Thus 
Proposition  4.44  shows  that  Dn  is  an  (internal)  semidirect  product  of  G  and  H 
with  respect  to  some  r  :  G  -a-  Aut  H.  To  understand  the  homomorphism  r,  let  us 
write  the  members  of  H  as  the  powers  of  r,  where  r  is  rotation  counterclockwise 
about  the  origin  through  the  angle  lit  In.  For  the  reflection  s  (or  indeed  for  any 
reflection  in  D„),  a  look  at  the  geometry  shows  that  srks~x  =  r~k  for  all  k.  In 
other  words,  the  automorphism  r  (1 )  leaves  each  element  of  H  fixed  while  t(s) 
sends  each  k  mod  n  to  —k  mod  n.  The  map  that  sends  each  element  of  a  cyclic 
group  to  its  group  inverse  is  indeed  an  automorphism  of  the  cyclic  group,  and 
thus  r  is  indeed  a  homomorphism  of  G  into  Aut  H. 

Example  2.  Construction  of  a  nonabelian  group  of  order  21.  Let  H  =  C 7, 
written  multiplicatively  with  generator  a ,  and  let  G  =  C 3 ,  written  multiplicatively 
with  generator  b.  To  arrange  for  G  to  act  on  H  by  automorphisms,  we  make  use 
of  a  nontrivial  automorphism  of  H  of  order  3.  Such  a  mapping  is  ak  i->  a2k .  In 
fact,  there  is  no  doubt  that  this  mapping  is  an  automorphism,  and  we  have  to  see 
that  it  has  order  3.  The  effect  of  applying  it  twice  is  ak  i->  a4k,  and  the  effect 
of  applying  it  three  times  is  ak  i->  a&k.  But  a8k  =  ak  since  a1  =  1,  and  thus 
the  mapping  ak  i->  a2k  indeed  has  order  3.  We  send  bn  into  the  nth  power  of 
this  automorphism,  and  the  result  is  a  homomorphism  r  :  G  — Aut  H .  The 
semidirect  product  G  xT  H  is  certainly  a  group  of  order  3  x  7  =  21.  To  see 
that  it  is  nonabelian,  we  observe  from  the  group  law  in  Proposition  4.43  that 
ab  =  brh-i  (a)  =  baA.  Thus  ab  ^  ba,  and  G  xr  H  is  nonabelian. 

It  is  instructive  to  generalize  the  construction  in  Example  2  a  little  bit.  To  do 
so,  we  need  a  lemma. 

Lemma  4.45,  If  p  is  a  prime,  then  the  automorphisms  of  the  additive  group 
of  the  field  F/;  are  the  multiplications  by  the  members  of  the  multiplicative  group 
F* ,  and  consequently  Aut  Cp  is  isomorphic  to  a  cyclic  group  Cp-\ . 

PROOF.  Let  us  write  AutFp  for  the  automorphism  group  of  the  additive  group 
of  Fp.  Each  function  tpa  :  Fp  — >■  Fp  given  by  (pa(n)  =  na,  taken  modulo 
p,  is  in  AutFp  as  a  consequence  of  the  distributive  law.  We  define  a  function 
O  :  AutFp  — >  F^  by  <f(<p)  =  tp(\)  for  ip  e  AutFp.  Again  by  the  distributive 
law  tp{n)  =  n<p(  1)  for  every  integer  n.  Thus  if  <p\  and  (p2  are  in  AutF^,  then 
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4>(<Pt  °  <p2 )  =  Oi  o  <p2){  1)  =  ^i(^2(l))  =  <92(1)^i(1),  and  consequently  <F  is  a 
homomorphism.  If  a  member  tp  of  AutF;)  has  O(^)  =  1  in  F* ,  then  <p(  1)  =  1 
and  therefore  <p(n)  =  mp{  1)  =  n  for  all  n.  Therefore  (p  is  the  identity  in  AutFp. 
We  conclude  that  <t>  is  one-one.  If  a  is  given  in  F* ,  then  4>(<pa)  =  <pa(  1)  =  a, 
and  hence  is  onto  F*.  Therefore  <t>  is  an  isomorphism  of  AutFp  and  F*.  By 
Corollary  4.27,  <t>  exhibits  Aut  F;)  as  isomorphic  to  the  cyclic  group  Cp-\ .  □ 

Proposition  4.46.  If  p  and  q  are  primes  with  p  <  q  such  that  p  divides  q  —  1 , 
then  there  exists  a  nonabelian  group  of  order  pq. 

Remarks.  For  p  =  2,  the  divisibility  condition  is  automatic,  and  the  proof 
will  yield  the  dihedral  group  Dq.  For  p  =  3  and  q  =  7,  the  condition  is  that  3 
divides  7—1,  and  the  constructed  group  will  be  the  group  in  Example  2  above. 

PROOF.  LetG  =  Cp  with  generator  a,  and  let//  =  Cq.  Lemma  4.45  shows  that 
Aut  Cq  =  Cq- 1 .  Let  b  be  a  generator  of  Aut  Cq.  Since  p  divides  q  —  1,  b(q~l)^p 
has  order  p.  Then  the  map  ak  bk(q~]  )/p  js  a  well-defined  homomorphism 
r  of  G  into  Aut//,  and  it  determines  a  semidirect  product  S  =  G  xT  H,  by 
Proposition  4.43.  The  order  of  5  is  pq ,  and  the  multiplication  is  nonabelian  since 
for  h  e  H,  we  have  ( a ,  1)(1.  h )  =  (a,  h )  and  (1,  h)(a,  1)  =  (a,  ra-i(h))  = 
(a,  b-to-Wr (h)),  but  is  not  the  identity  automorphism  of  H  because 

it  has  order  p.  □ 
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A  group  G  ^  { 1 }  is  said  to  be  simple  if  its  only  normal  subgroups  are  { 1 }  and  G. 

Among  abelian  groups  the  simple  ones  are  the  cyclic  groups  of  prime  order. 
Indeed,  a  cyclic  group  Cp  of  prime  order  has  no  nontrivial  subgroups  at  all,  by 
Corollary  4.9.  Conversely  if  G  is  abelian  and  simple,  let  a  ^  1  be  in  G.  Then 
{a"}  is  a  cyclic  subgroup  and  is  normal  since  G  is  abelian.  Thus  {a11}  is  all  of  G, 
and  G  is  cyclic.  The  group  Z  is  not  simple,  having  the  nontrivial  subgroup  2Z, 
and  the  group  Z/(r5)Z  with  r  >  1  and  5  >  1  is  not  simple,  having  the  multiples 
of  r  as  a  nontrivial  subgroup.  Thus  G  has  to  be  cyclic  of  prime  order. 

The  interest  is  in  nonabelian  simple  groups.  We  shall  establish  that  the  alter¬ 
nating  groups  21, ,  are  simple  for  n  >  5,  and  some  other  simple  groups  will  be 
considered  in  Problems  55-62  at  the  end  of  the  chapter. 

Theorem  4.47.  The  alternating  group  21  „  is  simple  if  n  >  5. 

Proof.  Let  K  ^  {1}  be  a  normal  subgroup  of  2l„.  Choose  cr  in  K  with  o  /  1 
such  that  ct(/)  =  i  for  the  maximum  possible  number  of  integers  i  with  1  <  i  <  n. 
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The  main  step  is  to  show  that  cr  is  a  3-cycle.  Arguing  by  contradiction,  suppose 
that  a  is  not  a  3-cycle.  Then  there  are  two  cases. 

The  first  case  is  that  the  decomposition  of  a  as  the  product  of  disjoint  cycles 
contains  a  /c-cycle  for  some  k  >  3.  Without  loss  of  generality,  we  may  take  the 
cycle  in  question  to  be  y  =  (1  2  3  •  •  • ),  and  then  a  =  yp  =  (1  2  3  •  •  •  )p 
with  p  equal  to  a  product  of  disjoint  cycles  not  containing  the  symbols  appearing 
in  y.  Being  even  and  not  being  a  3-cycle,  a  moves  at  least  two  other  symbols 
besides  the  three  listed  ones,  say  4  and  5.  Put  r  =  (3  4  5).  Lemma  4.41  shows 
that  a'  =  rcrr-1  =  y'p'  =  (1  2  4  •  •  •  )p'  with  p'  not  containing  any  of  the 
symbols  appearing  in  y' .  Thus  a'a~x  moves  3  into  4  and  cannot  be  the  identity. 
But  cr'cr-1  is  in  K  and  fixes  all  symbols  other  than  1,  2,  3, 4,  5  that  are  fixed  by 
a .  In  addition,  cr'cr-1  fixes  2,  and  none  of  1,  2,  3,  4,  5  is  fixed  by  a.  Thus  cr'cr-1 
is  a  member  of  K  other  than  the  identity  that  fixes  fewer  symbols  than  cr ,  and  we 
have  arrived  at  a  contradiction. 

The  second  case  is  that  a  is  a  product  a  =  (1  2)  (3  4)  •  •  •  of  disjoint 
transpositions.  There  must  be  at  least  two  factors  since  a  is  even.  Put  r  = 
(1  2) (4  5),  the  symbol  5  existing  since  the  group  21, ,  in  questionhas  n  >  5.  Then 
cr'  =  (1  2)  (3  5)  •  •  • .  Since  a'cr-1  carries  4  into  5,  cr'cr-1  is  a  member  of  Ai  other 
than  the  identity.  It  fixes  all  symbols  other  than  1,  2,  3,  4,  5  that  are  fixed  by  cr, 
and  in  addition  it  fixes  1  and  2.  Thus  o'a~x  fixes  more  symbols  than  a  does,  and 
again  we  have  arrived  at  a  contradiction. 

We  conclude  that  K  contains  a  3-cycle,  say  (1  2  3).  If  i,  j.  k,  /,  m  are  five 
arbitrary  symbols,  then  we  can  construct  a  permutation  r  with  r(l)  =  i,  t  (2)  =  j, 
r(3)  =  k ,  r(4)  =  /,  and  r (5)  =  in.  If  r  is  odd,  we  replace  r  by  r(7  m ),  and  the 
result  is  even.  Thus  we  may  assume  that  r  is  in  2t„  and  has  r ( 1 )  =  i,  r(2)  =  j, 
and  r(3)  =  k.  Lemma  4.41  shows  that  rcrr-1  =  (i  j  k).  Since  K  is  normal, 
we  conclude  that  K  contains  all  3-cycles. 

To  complete  the  proof,  we  show  for  n  >  3  that  every  element  of  2t„  is  a  product 
of  3-cycles.  If  a  is  in  2l„,  we  use  Corollary  1.22  to  decompose  a  as  a  product  of 
transpositions.  Since  a  is  even,  we  can  group  these  in  pairs.  If  the  members  of  a 
pair  of  transpositions  are  not  disjoint,  then  their  product  is  a  3-cycle.  If  they  are 
disjoint,  then  the  identity  (1  2)  (3  4)  =  (1  2  3)  (2  3  4)  shows  that  their  product 
is  a  product  of  3-cycles.  This  completes  the  proof.  □ 

Let  G  be  a  group.  A  descending  sequence 

Gn  2  G„- 1  2"'2Gi  2  Go 

of  subgroups  of  G  with  Gn  =  G,  Go  =  {1},  and  each  G^-x  normal  in  G*  is 
called  a  normal  series  for  G.  The  normal  series  is  called  a  composition  series  if 
each  inclusion  Ga  kk  Ga- i  is  proper  and  if  each  consecutive  quotient  Ga/Ga_i 
is  simple. 
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Examples. 

(1)  Let  G  be  a  cyclic  group  of  order  TV.  A  normal  series  for  G  consists  of 
certain  subgroups  of  G,  all  necessarily  cyclic  by  Proposition  4.4.  Their  respective 
orders  Nn,  TV„_i, . . . ,  N\,  N0  have  Nn  =  TV,  No  =  E  and  TV*—  i  |  TV*  for  all  k. 
The  series  is  a  composition  series  if  and  only  if  each  quotient  Nk/Nk~\  is  prime. 
In  this  case  the  primes  that  occur  are  exactly  the  prime  divisors  of  TV,  and  a 
prime  p  occurs  r  times  if  //  is  the  exact  power  of  p  that  divides  TV.  Thus  the 
consecutive  quotients  from  a  composition  series  of  this  G,  up  to  isomorphisms, 
are  independent  of  the  particular  composition  series— though  they  may  arise  in  a 
different  order. 

(2)  For  G  =  Z,  a  normal  series  is  of  the  form 


Z  E)  m{L  2  ni\ni2 Z  2  mim2m3Z  3  ■  •  •  3  0. 


The  group  G  =  Z  has  no  composition  series. 

(3)  For  the  symmetric  group  G  =  64,  let  C2  x  C2  refer  to  the  4-element 
subgroup  {1,  (1  2)(3  4),  (1  3) (2  4),  (1  4)(2  3)}.  The  series 

64  2  2l4  3C2xC23  {1,  (1  2)  (3  4)}  2  {1} 

is  a  composition  series,  the  consecutive  quotients  being  C2,  C3,  C2,  C2.  Each 
term  in  the  composition  series  except  for  {1,  (1  2)  (3  4)}  is  actually  normal  in 
the  whole  group  G,  but  there  is  no  way  to  choose  the  2-element  subgroup  to  make 
it  normal  in  G.  The  other  two  possible  choices  of  2-element  subgroup,  which 
lead  to  different  composition  series  but  with  isomorphic  consecutive  quotients, 
are  obtained  by  replacing  {1,  (1  2)(3  4)}  by  {1,  (1  3)(2  4)}  and  again  by 
{1,  (1  4) (2  3)}. 

(4)  For  the  symmetric  group  G  =  65,  the  series 


65  2  a5  2  {i} 

is  a  composition  series,  the  consecutive  quotients  being  C2  and  2I5. 

(5)  Let  G  be  a  finite  group  of  order  pn  with  p  prime.  Corollary  4.40  produces 
a  composition  series,  and  this  time  all  the  subgroups  are  normal  in  G.  The 
successive  normal  subgroups  have  orders  pk  for  k  =  n ,  n  —  I .....  0,  and  each 
consecutive  quotient  is  isomorphic  to  Cp. 

Historically  the  Jordan-Holder  Theorem  addressed  composition  series  for 
groups,  showing  that  the  consecutive  quotients,  up  to  isomorphisms,  are  indepen¬ 
dent  of  the  particular  composition  series.  They  can  then  consistently  be  called  the 
composition  factors  of  the  group.  Finding  the  composition  factors  of  a  particular 
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group  may  be  regarded  as  a  step  toward  understanding  the  structure  of  the  group. 
A  generalization  of  the  Jordan-Holder  Theorem  due  to  Zassenhaus  and  Schreier 
applies  to  normal  series  in  situations  in  which  composition  series  might  not  exist, 
such  as  Example  2  above.  We  prove  the  Zassenhaus-Schreier  Theorem,  and  the 
Jordan-Holder  Theorem  is  then  a  special  case. 

Two  normal  series 


Gm  2  Gm- 1  2  •  •  •  2  G\  2  Go 
and  Hn  2  H„-i  2  •  •  •  2  Hi  2  Ho 

for  the  same  group  G  are  said  to  be  equivalent  normal  series  if  m  =  n  and  the 
order  of  the  consecutive  quotients  Gm/Gm_i,  G,„_i/G,„_2,  ...  ,  G\/Gq  may  be 
rearranged  so  that  they  are  respectively  isomorphic  to  Hm_i/H„, _2, 

. . .  ,  H\/Hq.  One  normal  series  is  said  to  be  a  refinement  of  another  if  the 
subgroups  appearing  in  the  second  normal  series  all  appear  as  subgroups  in  the 
first  normal  series. 

Lemma  4.48  (Zassenhaus).  Let  G  i,  G2,  Gj .  and  G'2  be  subgroups  of  a  group 
G  with  Gj  c  G 1  and  G'2  2  G2,  G\  normal  in  Gi,  and  Gj  normal  in  G2.  Then 
(G 1  n  Gj)Gj  is  normal  in  (Gi  0  G2)Gj ,  (G\  n  G2)G'2  is  normal  in  (Gx  O  G2)Gj, 
and 


((Gi  n  g2)G,1)/((G1  n  g’2)G\)  =  ((G,  n  G2)G')/((Gj  n  g2)g'). 

Proof.  Let  us  check  that  ( G 1  0  Gj)Gj  is  normal  in  (G 1  0  G2)Gj .  Handling 
conjugation  by  members  of  G\  0  G2  is  straightforward:  If  g  is  in  G\  0  G2, 
then  g(G  1  0  Gj)#-1  =  G\  O  G'2  since  g  is  in  G 1  and  gG^g-1  =  G't  Also, 
gG\g~l  =  G\  since  g  is  in  Gi.  Hence  g(Gi  O  G^Gjg-1  =  (Gi  O  G2)Gj. 

Handling  conjugation  by  members  of  G  j  requires  a  little  trick:  Let  g  be  in  G  j 
and  let  kg’  be  in  (G 1  D  G2)Gj.  Then  g(hg')g~l  =  h(h~1gh)g'g~1.  The  left 
factor  h  is  in  Gi  fl  G).  The  remaining  factors  are  in  Gj ;  for  g'  and  g~ 1 ,  this  is 
a  matter  of  definition,  and  for  h~l gh,  it  follows  because  h  is  in  Gi  and  g  is  in 
Gj.  Thus  g(G]  n  G2)Gjg-1  =  (Gi  IT  G2)Gj,  and  (G 1  IT  G2)Gj  is  normal  in 
(G 1  fl  G2)Gj.  The  other  assertion  about  normal  subgroups  holds  by  symmetry 
in  the  indexes  1  and  2. 

By  the  Second  Isomorphism  Theorem  (Theorem  4.14), 

(Gi  n  g2)/(((Gi  n  G2)Gj)  n  (Gj  n  g2)) 

=  ((Gi  n  G2)(Gi  n  G2)Gj)/((Gi  n  G'2)Gj)  (*) 

=  ((G 1  fl  G2)Gj)/((Gi  n  G2)Gj). 
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Since  we  have 

((Gi  n  g'2)g\)  n  (Gi  n  g2)  =  ((Gi  n  g'2)G\)  n  g2  =  (Gi  n  g'2)(g\  n  g2), 

we  can  rewrite  the  conclusion  of  (*)  as 

(Gi  n  G2)/((G!  n  g^kg;  n  g2))  =  ((Gi  n  g2)g'1)/((Gi  n  g^g;).  (**) 

The  left  side  of  (**)  is  symmetric  under  interchange  of  the  indices  1  and  2.  Hence 
so  is  the  right  side,  and  the  lemma  follows.  □ 


Theorem  4.49  (Schreier).  Any  two  normal  series  of  a  group  G  have  equivalent 
refinements. 

Proof.  Let  the  two  normal  series  be 


G,„  2  Gm_ i  2  ■"  2  Gi  2  Go, 

Hn  2  Hn- 1  2"'2//|2% 


and  define 


Gjj  =  (Gi  n  Hj)Gi+ 1  forO  <  ./  <  n, 

Hjj  =  (Gi  n  Hj)Hj+\  forO  <  i  <  m. 

Then  we  obtain  respective  refinements  of  the  two  normal  series  (*)  given  by 


(*) 


(**) 


G  =  Goo  21  Goi  2  •  •  •  2  Gon 

2  Gio  2  G„  2  •  •  •  2  Gi„  •  •  •  2  G =  {1}, 

G  =  //oo  2  Go  i  ^  •  2  Go,,, 

2G102Gn2-2  Go,,  •  •  •  2  =  {!}• 


(t) 


The  containments  G,„  2  G,-+1,0  and  Hjm  Hj+i, o  are  equalities  in  (f),  and 
the  only  nonzero  consecutive  quotients  are  therefore  of  the  form  G,y/G,-j+ 1  and 
Hjj / Hjj+\ .  For  these  we  have 


Gij/Gij+i  =  ((Gi  n  Hj)Gi+l)/((Gi  n  Hj+i)Gi+1) 
=  ((Gi  n  Hj)Hj+i)/((Gi+i  n  Hj)Hj+i) 

=  Hji/Hjj+i 


and  thus  the  refinements  (f)  are  equivalent. 


by  (**) 

by  Lemma  4.48 

by  (**), 


□ 
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Corollary  4.50  (Jordan-Holder  Theorem).  Any  two  composition  series  of  a 
group  G  are  equivalent  as  normal  series. 

PROOF.  Let  two  composition  series  be  given.  Theorem  4.49  says  that  we 
can  insert  terms  in  each  so  that  the  refined  series  have  the  same  length  and  are 
equivalent.  Since  the  given  series  are  composition  series,  the  only  way  to  insert 
a  new  term  is  by  repeating  some  term,  and  the  repetition  results  in  a  consecutive 
quotient  of  {1}.  Because  of  Theorem  4.49  we  know  that  the  quotients  {1}  from 
the  two  refined  series  must  match.  Thus  the  number  of  terms  added  to  each  series 
is  the  same.  Also,  the  quotients  that  are  not  {1}  must  match  in  pairs.  Thus  the 
given  composition  series  are  equivalent.  □ 


9.  Structure  of  Finitely  Generated  Abelian  Groups 

A  set  of  generators  for  a  group  G  is  a  set  such  that  each  element  of  G  is  a  finite 
product  of  generators  and  their  inverses.  (A  generator  and  its  inverse  are  allowed 
to  occur  multiple  times  in  a  product.) 

In  this  section  we  shall  study  abelian  groups  having  a  finite  set  of  generators. 
Such  groups  are  said  to  be  finitely  generated  abelian  groups,  and  our  goal  is 
to  classify  them  up  to  isomorphism.  We  use  additive  notation  for  all  our  abelian 
groups  in  this  section.  We  begin  by  introducing  an  analog  Z"  for  the  integers  Z 
of  the  vector  space  R"  for  the  reals  R,  and  along  with  it  a  generalization. 

A  free  abelian  group  is  any  abelian  group  isomorphic  to  a  direct  sum,  finite  or 
infinite,  of  copies  of  the  additive  group  Z  of  integers.  The  external  direct  sum  of  n 
copies  of  Z  will  be  denoted  by  Z" .  Let  us  use  Proposition  4. 1 7  to  see  that  we  can 
recognize  groups  isomorphic  to  free  abelian  groups  by  means  of  the  following 
condition:  an  abelian  group  G  is  isomorphic  to  a  free  abelian  group  if  and  only  if 
it  has  a  Z  basis,  i.e.,  a  subset  that  generates  G  and  is  such  that  no  nontrivial  linear 
combination,  with  integer  coefficients,  of  the  members  of  the  subset  is  equal  to 
the  0  element  of  the  group.  It  will  be  helpful  to  use  terminology  adapted  from  the 
theory  of  vector  spaces  for  this  latter  condition— that  the  subset  is  to  be  linearly 
independent  over  Z. 

Let  us  give  the  proof  that  the  condition  is  necessary  and  sufficient  for  G  to  be 
free  abelian.  In  one  direction  if  G  is  an  external  direct  sum  of  copies  of  Z,  then 
the  members  of  G  that  are  1  in  one  coordinate  and  are  0  elsewhere  form  a  Z  basis. 
Conversely  if  {gs}JS5  is  a  Z  basis,  let  GSo  be  the  subgroup  of  multiples  of  gSo,  and 
let  (pSo  be  the  inclusion  homomorphism  of  GSo  into  G.  Proposition  4.17  produces 
a  unique  group  homomorphism  <p  :  ®ss5  Gs  — »■  G  such  that  (p  o  iSo  =  tpSo  for 
all  so  e  S.  The  spanning  condition  for  the  Z  basis  says  that  <p  is  onto  G,  and  the 
linear  independence  condition  for  the  Z  basis  says  that  <p  has  0  kernel. 
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The  similarity  between  vector-space  bases  and  Z  bases  suggests  further  com¬ 
parison  of  vector  spaces  and  abelian  groups.  With  vector  spaces  over  a  held,  every 
vector  space  has  a  basis  over  the  held.  However,  it  is  exceptional  for  an  abelian 
group  to  have  a  Z  basis.  Two  examples  that  hint  at  the  difficulty  are  the  additive 
group  Z/mZ  with  in  >  1  and  the  additive  group  Q.  The  group  Z/mZ  has  no 
nonempty  linearly  independent  set,  while  the  group  Q  has  a  linearly  independent 
set  of  one  element,  no  spanning  set  of  one  element,  and  no  linearly  independent 
set  of  more  than  one  element.  Here  are  two  positive  examples. 

Examples. 

(1)  The  additive  group  of  all  points  in  R"  whose  coordinates  are  integers.  The 
standard  basis  of  R"  is  a  Z  basis. 

(2)  The  additive  group  of  all  points  (x,  y)  in  R2  with  a  and  y  both  in  Z  or  both 
in  Z  +  j.  The  set  {(1,  0),  Q,  |)}  is  a  Z  basis. 

Next  we  take  a  small  step  that  eliminates  technical  complications  from  the 
discussion,  proving  that  any  subgroup  of  a  hnitely  generated  abelian  group  is 
hnitely  generated. 

Lemma  4.51.  Let  <p  :  G  — »■  H  be  a  homomorphism  of  abelian  groups.  If 
ker  (p  and  image  (p  are  hnitely  generated,  then  G  is  hnitely  generated. 

PROOF.  Let  {xi , . . . ,  xm }  and  { yi , . . . ,  yn }  be  respective  hnite  sets  of  generators 
for  ker^  and  image (p.  Lor  I  <  /  <  n ,  choose  xj  in  G  with  <p(xj)  =  yj. 
We  shall  prove  that  {x\ , . . . ,  xm,  x[, . . . ,  x'n)  is  a  set  of  generators  for  G.  Thus 
let  x  be  in  G.  Since  <p(x)  is  in  image <p,  there  exist  integers  a\. ...  ,an  with 
<p(x)  =  a\y\  +  •  •  •  +  anyn.  The  element  x'  =  a \x\  +  •  •  •  +  anx'n  of  G  has 
<p(x')  =  a\yi  +  •  •  •  +  a„yn  =  <p(x).  Therefore  tp(x  —  x')  =  0,  and  there  exist 
integers  b\, ...  ,bm  with  x  —  x1  =  b\X\  +  •  •  •  +  bmxm.  Hence 

x  =  b\x i  H - b  bmxm  +  x'  =  b\x\  -\ - h  bmxm  +  a\x[  H - b  anx'n .  □ 

Proposition  4.52.  Any  subgroup  of  a  hnitely  generated  abelian  group  is  hnitely 
generated. 

PROOF.  Let  G  be  hnitely  generated  with  a  set  {gi , . . . ,  gn }  of  n  generators,  and 
dehne  Gk  =  7Lg\  +  •  •  •  +  for  1  <  k  <  n.  If  H  is  any  subgroup  of  G,  dehne 
Hi  =  HD  Gj{  for  I  <  k  <  n.  We  shall  prove  by  induction  on  k  that  every  Hi 
is  hnitely  generated,  and  then  the  case  k  =  n  gives  the  proposition.  Lor  k  =  1, 
G i  =  Zgi  is  a  cyclic  group,  and  any  subgroup  of  it  is  cyclic  by  Proposition  4.4 
and  hence  is  hnitely  generated. 

Assume  inductively  that  every  subgroup  of  Gi  is  known  to  be  hnitely  generated. 
Let  q  :  Gk+i  — >•  G/,-+i /G/c  be  the  quotient  homomorphism,  and  let  tp  =  q\  , 

'  Hk+ 1 
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mapping  H^+\  into  Gt+\  / Gk-  Then  ker  (p  =  H^+ \  fl  G*  is  a  subgroup  of  G, t  and 
is  finitely  generated  by  the  inductive  hypothesis.  Also,  image  <p  is  a  subgroup  of 
Gk+ 1  /  Gk,  which  is  a  cyclic  group  with  generator  equal  to  the  coset  of  gk+\ .  Since 
a  subgroup  of  a  cyclic  group  is  cyclic,  image  <p  is  finitely  generated.  Applying 
Lemma  4.51  to  <p,  we  see  that  Hf:+ 1  is  finitely  generated.  This  completes  the 
induction  and  the  proof.  □ 

A  free  abelian  group  has  finite  rank  if  it  has  a  finite  Z  basis,  hence  if  it  is 
isomorphic  to  Z”  for  some  n.  The  first  theorem  is  that  the  integer  n  is  determined 
by  the  group. 

Theorem  4.53.  The  number  of  Z  summands  in  a  free  abelian  group  of  finite 
rank  is  independent  of  the  direct- sum  decomposition  of  the  group. 

We  define  this  number  to  be  the  rank  of  the  free  abelian  group.  Actually, 
“rank”  is  a  well-defined  cardinal  in  the  infinite-rank  case  as  well,  because  the  rank 
coincides  in  that  case  with  the  cardinality  of  the  group.  In  any  event.  Theorem 
4.53  follows  immediately  by  two  applications  of  the  following  lemma. 

Lemma  4.54.  If  G  is  a  free  abelian  group  with  a  finite  Z  basis  x\ ,  . . . ,  xn,  then 
any  linearly  independent  subset  of  G  has  <  n  elements. 

PROOF.  Let  {Vi , . . . ,  }’m }  be  a  linearly  independent  set  in  G.  Since  {.r  i , . . . ,  x„ } 
is  a  Z  basis,  we  can  define  an  m-by-n  matrix  C  of  integers  by  y,  =  Yl'j=i  C )jXj ■ 
As  a  matrix  in  Mm„(Q),  C  has  rank  <  n.  Consequently  if  m  >  n.  then  the  rows 
are  linearly  dependent  over  Q,  and  we  can  find  rational  numbers  q\ , . . . ,  qm  not 
all  0  such  that  d/  Q j  =  0  for  all  j .  Multiplying  by  a  suitable  integer  to  clear 

fractions,  we  obtain  integers  k\ . km  not  all  0  such  that  Yl'iLi  k C,  j  =  0  for 

all  j.  Then  we  have 

m  m  n  n  m  n 

y  kty,  -  Y  ki  Y  CjjXj  -  V  (  V  kiClj)x]  =  V  0 Xj  =  0, 

(=i  i=i  7=1  7=1  ;=i  7=1 

in  contradiction  to  the  linear  independence  of  {yi, . . . ,  y,», }  over  Z.  Therefore 
m  <n.  □ 

Now  we  come  to  the  two  main  results  of  this  section.  The  first  is  a  special 
case  of  the  second  by  Proposition  4.52  and  Lemma  4.54.  The  two  will  be  proved 
together,  and  it  may  help  to  regard  the  proof  of  the  first  as  a  part  of  the  proof  of 
the  second. 

Theorem  4.55.  A  subgroup  H  of  a  free  abelian  group  G  of  finite  rank  n  is 
free  abelian  of  rank  <  n. 

Remark.  This  result  persists  in  the  case  of  infinite  rank,  but  we  do  not  need 
the  more  general  result  and  will  not  give  a  proof. 
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Theorem  4.56  (Fundamental  Theorem  of  Finitely  Generated  Abelian  Groups). 
Every  finitely  generated  abelian  group  is  a  finite  direct  sum  of  cyclic  groups.  The 
cyclic  groups  may  be  taken  to  be  copies  of  Z  and  various  Cpk  with  p  prime,  and 
in  this  case  the  cyclic  groups  are  unique  up  to  order  and  to  isomorphism. 

Remarks.  The  main  conclusion  of  the  theorem  is  the  decomposition  of  each 
finitely  generated  abelian  group  into  the  direct  sum  of  cyclic  groups.  An  alterna¬ 
tive  decomposition  of  the  given  group  that  forces  uniqueness  is  as  the  direct  sum 
of  copies  of  Z  and  finite  cyclic  groups  C</> , . . . ,  C,/r  such  that  d\  di,  ch  |  dy,. . .  , 
dr  _  |  |  dr.  A  proof  of  the  additional  statement  appears  in  the  problems  at  the  end 
of  Chapter  VIII.  The  integers  d\, ...  ,dr  are  sometimes  called  the  elementary 
divisors  of  the  group. 


Let  us  establish  the  setting  for  the  proof  of  Theorem  4.56.  Let  G  be  the  given 
group,  and  say  that  it  has  a  set  of  n  generators.  Proposition  4.17  produces  a 
homomorphism  <p  \7Ln  — >•  G  that  carries  the  standard  generators  x\, ...  ,xn  of 
Z"  to  the  generators  of  G.  and  tp  is  onto  G.  Let  H  be  the  kernel  of  <p.  As  a 
subgroup  of  Z”,  H  is  finitely  generated,  by  Proposition  4.52.  Let  yi, . . . ,  ym 
be  generators.  Theorem  4.55  predicts  that  H  is  in  fact  free  abelian,  hence  that 
{ Vi ,  - . . ,  ym  1  could  be  taken  to  be  linearly  independent  over  Z  with  m  <  n,  but 
we  do  not  assume  that  knowledge  in  the  proof  of  Theorem  4.56. 

The  motivation  for  the  main  part  of  the  proof  of  Theorem  4.56  comes  from 
the  elementary  theory  of  vector  spaces,  particularly  from  the  method  of  using  a 
basis  for  a  finite-dimensional  vector  space  to  find  a  basis  of  a  vector  subspace 
when  we  know  a  finite  spanning  set  for  the  vector  subspace.  Thus  let  V  be  a 
finite-dimensional  vector  space  over  R,  with  basis  {x/}'7=1,  and  let  U  be  a  vector 
subspace  with  spanning  set  {y,-}"Lj.  To  produce  a  vector-space  basis  for  U,  we 
imagine  expanding  the  y,-’s  as  linear  combinations  of  X\, . . . ,  x„.  We  can  think 
symbolically  of  this  expansion  as  expressing  each  y,  as  the  product  of  a  row 

r\ 

vector  of  real  numbers  times  the  formal  “column  vector”  I  ;  )  •  The  entries  of 
this  column  vector  are  vectors,  but  there  is  no  problem  in  working  with  it  since 


■  yi 


this  is  all  just  a  matter  of  notation  anyway.  Then  the  formal  column  vector 


of  m  members  of  U  equals  the  product  of  an  m  -by-n  matrix  of  real  numbers  times 


the  formal  column  vector  I  ;  1  ■  We  know  from  Chapter  II  that  the  procedure  for 


finding  a  basis  of  U  is  to  row  reduce  this  matrix  of  real  numbers.  The  nonzero 
rows  of  the  result  determine  a  basis  of  the  span  of  the  m  vectors  we  have  used,  and 
this  basis  is  related  tidily  to  the  given  basis  for  V.  We  can  compare  the  two  bases 
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to  understand  the  relationship  between  U  and  V .  To  prove  Theorem  4.56,  we 
would  like  to  use  the  same  procedure,  but  we  have  to  work  with  an  integer  matrix 
and  avoid  division.  This  means  that  only  two  of  the  three  usual  row  operations  are 
fully  available  for  the  row  reduction;  division  of  a  row  by  an  integer  is  allowable 
only  when  the  integer  is  ±  1 .  A  partial  substitute  for  division  comes  by  using  the 
steps  of  the  Euclidean  algorithm  via  the  division  algorithm  (Proposition  1.1),  but 

even  that  is  not  enough.  For  example,  if  the  m-by-n  matrix  is  ^  no 

further  row  reduction  is  possible  with  integer  operations.  However,  the  equations 
tell  us  that  H  is  the  subgroup  of  Z3  generated  by  (2,  1,  1)  and  (0,  0,  3),  and  it  is 
not  at  all  clear  how  to  write  Z 3 /H  as  a  direct  sum  of  cyclic  groups. 

The  row  operations  have  the  effect  of  changing  the  set  of  generators  of  H 
while  maintaining  the  fact  that  they  generate  H.  What  is  needed  is  to  allow  also 
column  reduction  with  integer  operations.  Steps  of  this  kind  have  the  effect  of 
changing  the  Z  basis  of  Z” .  When  steps  of  this  kind  are  allowed,  we  can  produce 
new  generators  of  H  and  a  new  basis  of  Z"  so  that  the  two  can  be  compared. 
With  the  example  above,  suitable  column  operations  are 


/ 2  1  l\  / 1  2 

^0  0  3j  ^  ^0  0 


0  0\  (\  0 
0  3  J  ^  ^0  3 


The  equations  with  the  new  generators  say  that  yj  =  x\  and  y'2  =  3x2.  Thus  H  is 
the  subgroup  Z  ©  3Z  ©  0Z,  nicely  aligned  with  Z3  =  Z  ©  Z  ©  Z.  The  quotient 
is  (Z/Z)  ©  (Z/3Z)  ©  (Z/0Z)  =  C3  ©  Z. 

The  proof  of  Theorem  4.56  will  make  use  of  an  algorithm  that  uses  row  and 
column  operations  involving  only  allowable  divisions  and  that  converts  the  matrix 
C  of  coefficients  so  that  its  nonzero  entries  are  the  diagonal  entries  C,,-  for 
1  <  i  <  r  and  no  other  entries.  The  algorithm  in  principle  can  be  very  slow,  and 
it  may  be  helpful  to  see  what  it  does  in  an  ordinary  example. 


Example.  Suppose  that  the  relationship  between  generators  y\,  V2,  >’3  of  H 
and  the  standard  Z  basis  {iq ,  ax  }  of  Z2  is 

Vt  |  =  C  (  X 1  )  ,  where  C  = 

J  w 

In  row  reduction  in  vector-space  theory,  we  would  start  by  dividing  the  first  row 
of  C  by  3,  but  division  by  3  is  not  available  in  the  present  context.  Our  target  for 
the  upper-left  entry  is  GCD(3,  7,  5)  =  1,  and  we  use  the  division  algorithm  one 
step  at  a  time  to  get  there.  To  begin  with,  it  says  that  7  =  2-3+1  and  hence 
7  —  2  •  3  =  1 .  The  first  step  of  row  reduction  is  then  to  replace  the  second  row  by 
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the  difference  of  it  and  2  times  the  first  row.  The  result  can  be  achieved  by  left 
multiplication  by 


We  write  this  step  as 


left  by 


and  is 


1  o  o 
-2  1  0 
0  0  1 


The  entry  1  in  the  first  column  is  our  target  for  this  stage  since  GCD(3,  7,5)  =  1 . 
The  next  step  interchanges  two  rows  to  move  the  1  to  the  upper  left  entry,  and  the 
subsequent  step  uses  the  1  to  eliminate  the  other  entries  of  the  first  column: 


3 

1 

5 


^  \  left  by  I  1  0  0  I 

3  j  | - '°01' 


(  100\ 

left  by  I  -3101 

V  -5  0  1  / 


The  algorithm  next  seeks  to  eliminate  the  off-diagonal  entry  3  in  the  first  row. 
This  is  done  by  a  column  operation: 


1 

0 

0 


right  by 


1  -3 

0  1 
- > 


With  two  further  row  operations  we  are  done: 


1 

0 

0 


/to  ox 

left  by  I  0  1-1 
\0  0  1/ 


/100X 
left  by  I  0  1  0  I 

V  0  3  1  / 


Our  steps  are  summarized  by  the  fact  that  the  matrix  A  with 


/I  0  0\/l  0  0\ 
A  =  I  0  1  OJIO  1  -1) 

\0  3  1/  \0  0  1/ 


1  0  0\  /0  1 
-3  1  0  )  I  1  0 

-5  0  1  /  Vo  0 


0 

1 

0 


0 

0 

1 


has 


and  by  the  fact  that  the  integer  matrices  to  the  left  and  right  of  C  have  determinant 
±1.  The  determinant  condition  ensures  that  A-1  and  ’  j  have  integer 
entries,  according  to  Cramer’s  rule  (Proposition  2.38). 
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Lemma  4.57.  If  C  is  an  m-by-n  matrix  of  integers,  then  there  exist  an  m-by-m 
matrix  A  of  integers  with  determinant  ±1  and  an  n-by-n  matrix  B  of  integers 
with  determinant  ±1  such  that  for  some  r  >  0,  the  nonzero  entries  of  D  =  AC B 
are  exactly  the  diagonal  entries  D\\ ,  D22 ■  . . . ,  Drr. 

Proof.  Given  C,  choose  ( i ,  j)  with  C, ,  |  ^  0  but  | C,-; 1  as  small  as  possible. 
(If  C  =  0,  the  algorithm  terminates.)  Possibly  by  interchanging  two  rows  and/or 
then  two  columns  (a  left  multiplication  with  determinant  —  1  and  then  a  right 
multiplication  with  determinant —  1),  we  may  assume  that  (i,  j)  =  (1,  1).  Bythe 
division  algorithm  write,  for  each  i, 

Cn  =  qiCn  +  n  withO  <  rl  <  |Cn|, 

and  replace  the  ;'th  row  by  the  difference  of  the  ilh  row  and  q  -,  times  the  first  row  (a 
left  multiplication).  If  some  r,  is  not  0,  the  result  will  leave  a  nonzero  entry  in  the 
first  column  that  is  <  |  C\ \  |  in  absolute  value.  Permute  the  least  such  r,  7^  0  to  the 
upper  left  and  repeat  the  process.  Since  the  least  absolute  value  is  going  down, 
this  process  at  some  point  terminates  with  all  r,-  equal  to  0.  The  first  column  then 
has  a  nonzero  diagonal  entry  and  is  otherwise  0. 

Now  consider  C\j  and  apply  the  division  algorithm  and  column  operations 
in  similar  fashion  in  order  to  process  the  first  row.  If  we  get  a  smaller  nonzero 
remainder,  permute  the  smallest  one  to  the  first  column.  Repeat  this  process  until 
the  first  row  is  0  except  for  entry  Cn.  Continue  alternately  with  row  and  column 
operations  in  this  fashion  until  both  C\j  =  0  for  j  >  1  and  Cn  =0  for  i  >  1. 

Repeat  the  algorithm  for  the  (m  —  l)-by-(«  —  1)  matrix  consisting  of  rows  2 
through  m  and  columns  2  through  n,  and  continue  inductively.  The  algorithm 
terminates  when  either  the  reduced-in-size  matrix  is  empty  or  is  all  0.  At  this 
point  the  original  matrix  has  been  converted  into  the  desired  “diagonal  form.”  □ 

Lemma  4.58.  Let  G\, . . . ,  Gn  be  abelian  groups,  and  for  1  <  j  <  n.  let  Hj 
be  a  subgroup  of  Gj .  Then 

(G,  ©  •  •  •  ®  ©•••©//„)  =  (Gi/tfi)  ©  •  •  •  0  (Gn/Hn). 

Proof.  Let  tp  :  G\  ©  •  •  •  ©  G„  ->■  (G\/H\)  ©  •  •  •  ©  ( Gn/Hn )  be  the 
homomorphism  defined  by  <p(g\, . . . ,  gn)  =  (g\H\, . . . ,  gnHn).  The  mapping 
<p  is  onto  (G\/ H\)  ©  •  •  •  ©  (G„/Hn),  and  the  kernel  is  H\  ©  •  •  •  ©  Hn.  Then 
Corollary  4.12  shows  that  cp  descends  to  the  required  isomorphism.  □ 

Proof  of  Theorem  4.55  and  main  conclusion  of  Theorem  4.56.  Given  G 
with  n  generators,  we  set  up  matters  as  indicated  immediately  after  the  statement 
of  Theorem  4.56,  writing 
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where  X\ , . . . ,  xn  are  the  standard  generators  of  Z",  y\ , . . . ,  ym  are  the  generators 
of  the  kernel  of  the  homomorphism  from  Z”  onto  G,  and  C  is  a  matrix  of  integers. 
Applying  Lemma  4.57,  let  A  and  B  be  square  integer  matrices  of  determinant  ±  1 
such  that  D  =  AC  B  is  diagonal  as  in  the  statement  of  the  lemma.  Define 


Substitution  gives 


If  (ci  •  •  •  c„)  and  (d\  ■  ■  ■  dn)  =  (ci  •  •  •  cn)B  1  are  row  vectors,  then  the 
formula 


u  i 


Cl  Ml  H - b  cnun  =  (ci 


Cn) 


—  d\X\  ©  ‘  ‘  ‘  ~\~  dnxn 


(*) 


shows  that  [u\, ... ,  un]  generates  the  same  subset  of  Z"  as  {x\ , . . . ,  x„ } .  Since 
(ci  •••  cn)  is  nonzero  if  and  only  if  (d\  •••  dn)  is  nonzero,  the  formula  (*)  shows 
also  that  the  linear  independence  of  {xi, . . . ,  xn }  implies  that  of  {u\,  . . . ,  u„}. 
Hence  [u\, . . . ,  u„ }  is  a  Z  basis  of  Z".  Similarly  {yi, . . . ,  y,„}  and  {zi, . . . ,  zm } 
generate  the  same  subgroup  H  of  Z".  Therefore  we  can  compare  H  and  7LP 
using  {z\, . . . ,  zm}  and  {m, . . . ,  u,,}.  Since  D  is  diagonal,  the  equations  relating 
{zi, .  ■  ■ ,  Zm]  and  [u\, . . . ,  un }  are  zj  =  DjjUj  for  j  <  min (m,  n )  and  Zj  =  0  for 
min (m,  n)  <  j  <  m.  If  q  =  mirK/n,  n),  then  we  see  that 

m  q  m  q 

H  =  J2%Zi  =  DaZui  +  J2  =  12  DnZuj. 

i= 1  i= 1  i  =q  + 1  i= 1 

Since  the  set  {u  i , . . . ,  uq }  is  linearly  independent  over  Z,  this  sum  exhibits  H  as 
given  by 

H  =  Du Z©  •  •  •  ©  Dqq Z 

with  /D 1 1 zy i ,  . . . ,  Dqqiiq  as  a  Z  basis.  Consequently  H  has  been  exhibited  as  free 
abelian  of  rank  <q  <n.  This  proves  Theorem  4.55.  Applying  Lemma  4.58  to 
the  quotient  Z"///  and  letting  Dn, ... ,  Dn  be  the  nonzero  diagonal  entries  of 
D,  we  see  that  H  has  rank  r,  and  we  obtain  an  expansion  of  G  in  terms  of  cyclic 
groups  as 

G  =  Cpu  ©  •  •  •  ©  Corr  ©  Z™  ' . 

This  proves  the  main  conclusion  of  Theorem  4.56.  □ 
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Proof  of  the  decomposition  with  cyclic  groups  of  prime-power  order. 
It  is  enough  to  prove  that  if  m  =  ]~[f  i  p/  with  the  pj  equal  to  distinct  primes, 

then  Z/mZ  =  (Z/pj'Z)  ©  •  •  •  ©  (Z/p^Z).  This  is  a  variant  of  the  Chinese 
Remainder  Theorem  (Corollary  1.9).  For  the  proof  let 

(p  :  Z  -»  (Z/pj’Z)  ©  •  •  •  ©  (Z/p^wZ) 

be  the  homomorphism  given  by  cp(s)  =  (5  mod  p\' , ...  ,s  mod  p^\  for  s  e  Z. 
Since  (p(m)  =  (0, ....  0),  <p  descends  to  a  homomorphism 

Tp  :  Z/mZ  ->  (Z / p*‘Z)  ©  •  •  •  ©  (Z/p^Z). 

•  k  ■  .  . 

The  map  95  is  one-one  because  if  < p(s )  =  0,  then  p-  divides  s  for  all  j.  Since 

k  ■  • 

the  p-  are  relatively  prime  in  pairs,  their  product  m  divides  5.  Since  m  divides  s, 
s  =  0  mod  m .  The  map  Tp  is  onto  since  it  is  one-one  and  since  the  finite  sets 
Z/mZ  and  (Z/pj‘Z)  ©  •  •  •  ©  (Z/pj^Z)  both  have  m  elements.  □ 

Proof  of  UNIQUENESS  of  the  decomposition.  Write  G  =  11  ®T,  where 

t  =  (z/p'/z)  ©  •  •  •  ©  (z/pjyz) 

and  the  p/s  are  not  necessarily  distinct.  The  subgroup  T  is  the  subgroup  of 
elements  of  finite  order  in  G,  and  it  is  well  defined  independently  of  the  decom¬ 
position  of  G  as  the  direct  sum  of  cyclic  groups.  The  quotient  G/T  =  H  is 
free  abelian  of  finite  rank,  and  its  rank  ,v  is  well  defined  by  Theorem  4.53.  Thus 
the  number  s  of  factors  of  Z  in  the  decomposition  of  G  is  uniquely  determined, 
and  we  need  only  consider  uniqueness  of  the  decomposition  of  the  finite  abelian 
group  T. 

For  p  prime  the  elements  of  T  of  order  p"  for  some  a  are  those  in  the  sum  of 
the  groups  Z/pj'Z  for  which  pj  =  p,  and  we  are  reduced  to  considering  a  group 

H  =  7Llph7L  ©  •  •  •  ©  Z/p,M'Z 

with  p  fixed  and  l\  <  ■  ■  ■  <  l m  .  The  set  of  pJ  powers  of  elements  of  H 
is  a  subgroup  of  H  and  is  given  by  Z/p/r_7Z  ©  •  •  •  ©  Z /p,«'“-/’Z  if  /,  is  the 
first  index  >  j\  while  the  set  of  p  ,+1  powers  of  elements  of  H  is  given  by 
Z/p/,'_7_1Z©  •  •  •  ©Z/p/m'_,Z  if  Ip  is  the  first  index  >  j  +  1.  Therefore  Lemma 
4.58  gives 

pj  H/pj+ 'H  =  (Z/pl'r ~j ~ 1 Z)/ fZ/V''- ~j  Z)  ©  •  •  •  ©  (Z/p'"'- ~j ~ 1 Z)/ (Z/p7"'- ~j  Z) . 

Each  term  of  p  '  H / p ,+ 1 H  has  order  p,  and  thus 

\pjH/pj+lH\  =  plf/|(i>7}1. 

Hence  H  determines  the  integers  l\, . . . ,  lM'.  and  uniqueness  is  proved.  □ 
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10.  Sylow  Theorems 

This  section  continues  the  use  of  group  actions  to  obtain  results  concerning 
structure  theory  for  abstract  groups.  We  shall  prove  the  three  Sylow  Theorems, 
which  are  a  starting  point  for  investigations  of  the  structure  of  finite  groups  that 
are  deeper  than  those  in  Sections  6  and  7.  We  state  the  three  theorems  as  the  parts 
of  Theorem  4.59. 

Theorem  4.59  (Sylow  Theorems).  Let  G  be  a  finite  group  of  order  pmr,  where 
p  is  prime  and  p  does  not  divide  r.  Then 

(a)  G  contains  a  subgroup  of  order  pm,  and  any  subgroup  of  G  of  order  pl 
with  0  <  1  <  m  is  contained  in  a  subgroup  of  order  pm , 

(b)  any  two  subgroups  of  order  pm  in  G  are  conjugate  in  G,  i.e.,  any  two 
such  subgroups  P\  and  Pi  have  Pi  =  a P\a~l  for  some  a  €  G, 

(c)  the  number  of  subgroups  of  order  pm  is  of  the  form  pk  +  1  and  divides  r . 

Remark.  A  subgroup  of  order  pm  as  in  the  theorem  is  called  a  Sylow 
p-subgroup  of  G.  A  consequence  of  (a)  when  m  >  1  is  that  G  has  a  subgroup 
of  order  p:  this  special  case  is  sometimes  called  Cauchy’s  Theorem  in  group 
theory. 

Before  coming  to  the  proof,  let  us  carefully  give  two  simple  applications 
to  structure  theory.  The  applications  combine  Theorem  4.59,  some  results  of 
Sections  6  and  7,  and  Problems  35-38  and  45-48  at  the  end  of  the  chapter. 

Proposition  4.60.  If  p  and  q  are  primes  with  p  <  q,  then  there  exists  a 
nonabelian  group  of  order  pq  if  and  only  if  p  divides  q  —  1 ,  and  in  this  case  the 
nonabelian  group  is  unique  up  to  isomorphism.  It  may  be  taken  to  be  a  semidirect 
product  of  the  cyclic  groups  Cp  and  Cq  with  Cq  normal. 

Remark.  It  follows  from  Theorem  4.56  that  the  only  abelian  group  of  order 
pq,  up  to  isomorphism,  is  Cp  x  Cq  =  Cpq.  If  p  =  2  in  the  proposition,  then  q 
is  odd  and  p  divides  q  —  1 ;  the  proposition  yields  the  dihedral  group  Dq .  For 
p  >  2,  the  divisibility  condition  may  or  may  not  hold:  For  pq  =  15,  the  condition 
does  not  hold,  and  hence  every  group  of  order  15  is  cyclic.  For  pq  =  21,  the 
condition  does  hold,  and  there  exists  a  nonabelian  group  of  order  21;  this  group 
was  constructed  explicitly  in  Example  2  in  Section  7. 

PROOF.  Existence  of  a  nonabelian  group  of  order  pq,  together  with  the 
semi  direct-product  structure,  is  established  by  Proposition  4.46  if  p  divides  q  —  1 . 
Let  us  see  uniqueness  and  the  necessity  of  the  condition  that  p  divide  q  —  1. 

If  G  has  order  pq,  Theorem  4.59a  shows  that  G  has  a  Sylow  p- subgroup  Hp 
and  a  Sylow  <y-su b group  Hq .  Corollary  4.9  shows  that  these  two  groups  are  cyclic. 
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The  conjugates  of  Hq  are  Sylow  g-subgroups,  and  Theorem  4.59c  shows  that  the 
number  of  such  conjugates  is  of  the  form  kq  +  1  and  divides  p.  Since  p  <  q, 
k  =  0.  Therefore  Hq  is  normal.  (Alternatively,  one  can  apply  Proposition  4.36 
to  see  that  Hq  is  normal.) 

Each  element  of  G  is  uniquely  a  product  ab  with  a  in  Hp  and  b  in  Hq .  For  the 
uniqueness,  if  a\b\  =  then  aP1a  i  =  b2b~[l  is  an  element  of  Hp  fl  Hq.  Its 
order  must  divide  both  p  and  q  and  hence  must  be  1.  Thus  the  pq  products  ab 
with  a  in  Hp  and  b  in  Hq  are  all  different.  Since  the  number  of  them  equals  the 
order  of  G,  every  member  of  G  is  such  a  product.  By  Proposition  4.44,  G  is  a 
semi  direct  product  of  Hp  and  Hq . 

If  the  action  of  Hp  on  Hq  is  nontrivial,  then  Problem  37  at  the  end  of  the  chapter 
shows  that  p  divides  q  —  1,  and  Problem  38  shows  that  the  group  is  unique  up 
to  isomorphism.  On  the  other  hand,  if  the  action  is  trivial,  then  G  is  certainly 
abelian.  □ 

Proposition  4.61.  If  G  is  a  group  of  order  12,  then  G  contains  a  subgroup 
H  of  order  3  and  a  subgroup  K  of  order  4,  and  at  least  one  of  them  is  normal. 
Consequently  there  are  exactly  five  groups  of  order  12,  up  to  isomorphism— two 
abelian  and  three  nonabelian. 

Remark.  The  second  statement  follows  from  the  first,  as  a  consequence  of 
Problems  45 — 48  at  the  end  of  the  chapter.  Those  problems  show  how  to  construct 
the  groups. 

PROOF.  Theorem  4.59a  shows  that  H  may  be  taken  to  be  a  Sylow  3-subgroup 
and  K  may  be  taken  to  be  a  Sylow  2-subgroup.  We  have  to  prove  that  either  H 
or  K  is  normal. 

Suppose  that  H  is  not  normal.  Theorem  4.59c  shows  that  the  number  of 
Sylow  3-subgroups  is  of  the  form  3k  +  1  and  divides  4.  The  subgroup  H .  not 
being  normal,  fails  to  equal  one  of  its  conjugates,  which  will  be  another  Sylow 
3-subgroup;  hence  k  >  0.  Therefore/:  =  1,  and  there  are  four  Sylow  3-subgroups. 
The  intersection  of  any  two  such  subgroups  is  a  subgroup  of  both  and  must  be 
trivial  since  3  is  prime.  Thus  the  set-theoretic  union  of  the  Sylow  3-subgroups 
accounts  for  4  •  2  +  1  elements.  None  of  these  elements  apart  from  the  identity 
lies  in  K,  and  thus  K  contributes  3  further  elements,  for  a  total  of  12.  Thus 
every  element  of  G  lies  in  if  or  a  conjugate  of  H .  Consequently  K  equals  every 
conjugate  of  K ,  and  K  is  normal.  □ 

Let  us  see  where  we  are  with  classifying  finite  groups  of  certain  orders,  up  to 
isomorphism.  A  group  of  order  p  is  cyclic  by  Corollary  4.9,  and  a  group  of  order 
p2  is  abelian  by  Corollary  4.39.  Groups  of  order  pq  are  settled  by  Proposition 
4.60.  Thus  for  p  and  q  prime,  we  know  the  structure  of  all  groups  of  order  p. 
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p2,  and  pq.  Problems  39-44  at  the  end  of  the  chapter  tell  us  the  structure  of  the 
groups  of  order  8,  and  Proposition  4.61  and  Problems  45-48  tell  us  the  structure 
of  the  groups  of  order  12.  In  particular,  the  table  at  the  end  of  Section  1,  which 
gives  examples  of  nonisomorphic  groups  of  order  at  most  15,  is  complete  except 
for  the  one  group  of  order  12  that  is  discussed  in  Problem  48. 

Problems  30-34  and  49-54  at  the  end  of  the  chapter  go  in  the  direction  of 
classifying  finite  groups  of  certain  other  orders. 

Now  we  return  to  Theorem  4.59.  The  proof  of  the  theorem  makes  use  of  the 
theory  of  group  actions  as  in  Section  6.  In  fact,  the  proof  of  existence  of  Sylow 
p-subgroups  is  just  an  elaboration  of  the  argument  used  to  prove  Corollary  4.38, 
saying  that  a  group  of  prime-power  order  has  a  nontrivial  center.  The  relevant 
action  for  the  existence  part  of  the  proof  is  the  one  (g,  x)  gxg~l  given  by 
conjugation  of  the  elements  of  the  group,  the  orbit  of  x  being  the  conjugacy  class 
Cl(x).  Proposition  4.37  shows  that  |G|  =  |  Cl (x)||ZG(x)|,  where  ZG(x)  is  the 
centralizer  of  x.  Since  the  disjoint  union  of  the  conjugacy  classes  is  all  of  |G|, 
we  have 

|G|  =  |Zg|+  £  \G\/\Zg(xj)\, 

representatives  xj 
of  each  conjugacy  class 
with  \Cl{x)\^\ 

a  formula  sometimes  called  the  class  equation  of  G. 

Proof  of  existence  of  Sylow  /^-subgroups  in  Theorem  4.59a.  We  induct 
on  | G | ,  the  base  case  being  |G|  =  1.  Suppose  that  existence  holds  for  groups  of 
order  <  |G|.  Without  loss  of  generality  suppose  that  m  >  0,  so  that  p  divides 
I G  | . 

First  suppose  that  p  does  not  divide  |ZG|.  Referring  to  the  class  equation 
of  G,  we  see  that  p  must  fail  to  divide  some  integer  |G|/|Zg(x;)|  for  which 
\Zg{X])\  <  | G | .  Since  pm  is  the  exact  power  of  p  dividing  |  G | ,  we  conclude  that 
pm  divides  this  |ZG(x;-)|  and  p'"+ 1  does  not.  Since  |ZG(x;)|  <  |G|,  the  inductive 
hypothesis  shows  that  ZG(x;)  has  a  subgroup  of  order  p'" ,  and  this  is  a  Sylow 
p-subgroup  of  G. 

Now  suppose  that  p  divides  |ZG[.  The  group  ZG  is  finitely  generated  abelian, 
hence  is  a  direct  sum  of  cyclic  groups  by  Theorem  4.56.  Thus  ZG  contains  an 
element  c  of  order  p.  The  cyclic  group  C  generated  by  c  then  has  order  p.  Being 
a  subgroup  of  ZG,  C  is  normal  in  G.  The  group  G/C  has  order  p"'-1r,  and 
the  inductive  hypothesis  implies  that  G/C  has  a  subgroup  H  of  order  pm_1.  If 
<p  :  G  — »■  G/C  denotes  the  quotient  map,  then  cp~l(H)  is  a  subgroup  of  G  of 
order  |//||ker<p|  =  pm-1p  =  p"!.  □ 

For  the  remaining  parts  of  Theorem  4.59,  we  make  use  of  a  different  group 
action.  If  f  denotes  the  set  of  all  subgroups  of  G,  then  G  acts  on  f  by  conjugation: 
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{g,  H)  f->  gHg  1 .  The  orbit  of  a  subgroup  of  H  consists  of  all  subgroups 
conjugate  to  H  in  G,  and  the  isotropy  subgroup  at  the  point  H  in  T  is 

{g  €  G  \gHg~1  =  H}. 

This  is  a  subgroup  N(H)  of  G  known  as  the  normalizer  of  H  in  G.  It  has  the 
properties  that  N(H)  D  H  and  that  H  is  a  normal  subgroup  of  N(H).  The 
counting  formula  of  Corollary  4.35  gives 

\{gHg~l  |  g  e  G}|  =  \G/N(H)\. 

Meanwhile,  application  of  Lagrange’s  Theorem  (Theorem  4.7)  to  the  three  quo¬ 
tients  G/H,  G/N(H ),  and  N(H)/H  shows  that 

\G/H\  =  \G/N(H)\\N(H)/H\, 
with  all  three  factors  being  integers. 

Now  assume  as  in  the  statement  of  Theorem  4.59  that  |  G  |  =  pmr  with  p  prime 
and  p  not  dividing  r.  In  this  setting  we  have  the  following  lemma. 

Lemma  4.62.  If  P  is  a  Sylow  ^-subgroup  of  G  and  if  H  is  a  subgroup  of  the 
normalizer  N(P)  whose  order  is  a  power  of  p,  then  H  C  /'. 

PROOF.  Since  H  C  N(P)  and  P  is  normal  in  N(P),  the  set  H  P  of  products  is 
a  group,  by  the  same  argument  as  used  for  Hp  Hq  in  the  proof  of  Proposition  4.60. 
Then  HP / P  =  H/{H  fl  P )  by  the  Second  Isomorphism  Theorem  (Theorem 
4.14),  and  hence  \HP/P\  is  some  power  pk  of  p.  By  Lagrange's  Theorem 
(Theorem  4.7),  \HP\  =  pm+k  with/c  >  0.  Since  no  subgroup  of  G  can  have 
order  p1  with  /  >  in,  we  must  have  k  =  0.  Thus  HP  =  P  and  H  c  P.  □ 

Proof  of  the  remainder  of  Theorem  4.59.  Within  the  set  T  of  all  subgroups 
of  G,  let  II  be  the  set  of  all  subgroups  of  G  of  order  pm.  We  have  seen  that  n  is 
not  empty.  Since  the  conjugate  of  a  subgroup  has  the  same  order  as  the  subgroup, 
II  is  the  union  of  orbits  in  T  under  conjugation  by  G.  Thus  we  can  restrict  the 
group  action  by  conjugation  from  GxT  F  to  G  x  Tl  ^  n. 

Let  P  and  P'  be  members  of  n,  and  let  Z  and  S'  be  the  G  orbits  of  P  and 
P'  under  conjugation.  Suppose  that  S  and  S'  are  distinct  orbits  of  G.  Let  us 
restrict  the  group  action  by  conjugation  from  GxII- >  n  to  f  x  n  ->  n.  The 
G  orbits  S  and  S'  then  break  into  P  orbits,  and  the  counting  formula  Corollary 
4.35  says  for  each  orbit  that 

pm  =  \P\  =  #{subgroups  in  a  P  orbit}  x  [isotropy  subgroup  within  P | . 
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Hence  the  number  of  subgroups  in  a  P  orbit  is  of  the  form  pl  for  some  /  >  0. 

Suppose  that  /  =  0.  Then  the  P  orbit  is  some  singleton  set  { P "},  and  the 
corresponding  isotropy  subgroup  within  P  is  all  of  P  : 

P  =  {pe  P  |  pP"p~x  =  P")  c  N(P"). 

Lemma  4.62  shows  that  P  C  P" ,  and  therefore  P  =  P" .  Thus  /  =  0  only  for  the 
P  orbit  {P}.  In  other  words,  the  number  of  elements  in  any  P  orbit  other  than 
{P}  is  divisible  by  p.  Consequently  |£|  =  1  mod  p  while  |£'|  =0  mod  p,  the 
latter  because  £  and  £'  are  assumed  distinct.  But  this  conclusion  is  asymmetric 
in  the  G  orbits  £  and  £',  and  we  conclude  that  £  and  £'  must  coincide.  Hence 
there  is  only  one  G  orbit  in  n,  and  it  has  kp+  1  members  for  some  k.  This  proves 
parts  (b)  and  (c)  except  for  the  fact  that  kp  +  1  divides  r . 

For  this  divisibility  let  us  apply  the  counting  formula  Corollary  4.35  to  the 
orbit  £  of  G.  The  formula  gives  |G|  =  |£[  [isotropy  subgroupl,  and  hence  |£| 
divides  |G|  =  pmr.  Since  |£|  =  kp  +  1,  we  have  GCD(|£|,  p)  =  1  and  also 
GCD(|£[,  pm )  =  1.  By  Corollary  1.3,  kp  +  1  divides  r. 

Finally  we  prove  that  any  subgroup  PI  of  G  of  order  p1  lies  in  some  Sylow 
p-subgroup.  Let  £  =  n  again  be  the  G  orbit  in  T  of  subgroups  of  order  p'" , 
and  restrict  the  action  by  conjugation  from  Gx£->£to//x£->£.  Each 
H  orbit  in  £  must  have  p"  elements  for  some  a,  by  one  more  application  of  the 
counting  formula  Corollary  4.35.  Since  |£|  =  1  mod  p,  some  H  orbit  has  one 
element,  say  the  H  orbit  of  P.  Then  the  isotropy  subgroup  of  H  at  the  point  P 
is  all  of  PL,  and  H  C  N(P).  By  Lemma  4.62,  H  C  P.  This  completes  the  proof 
of  Theorem  4.59.  □ 


11.  Categories  and  Functors 

The  mathematics  thus  far  in  the  book  has  taken  place  in  several  different  contexts, 
and  we  have  seen  that  the  same  notions  sometimes  recur  in  more  than  one  context, 
possibly  with  variations.  For  example  we  have  worked  with  vector  spaces,  inner- 
product  spaces,  groups,  rings,  and  fields,  and  we  have  seen  that  each  of  these  areas 
has  its  own  definition  of  isomorphism.  In  addition,  the  notion  of  direct  product 
or  direct  sum  has  arisen  in  more  than  one  of  these  contexts,  and  there  are  other 
similarities.  In  this  section  we  introduce  some  terminology  to  make  the  notion 
of  “context”  precise  and  to  provide  a  setting  for  discussing  similarities  between 
different  contexts. 

A  category  C  consists  of  three  things: 

•  a  class  of  objects,  denoted  by  Obj(C), 

•  for  any  two  objects  A  and  B  in  the  category,  a  set  Morph (4 .  B)  of 

morphisms. 
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•  for  any  three  objects  A,  B,  and  C  in  the  category,  a  law  of  composition 
for  morphisms,  i.e.,  a  function  carrying  Morph(  A,  B)  x  Morph(B,  C)  into 
Morph(A,  C),  with  the  image  of  (/,  g)  under  composition  written  as  gf, 

and  these  are  to  satisfy  certain  properties  that  we  list  in  a  moment.  When  more 
than  one  category  is  under  discussion,  we  may  use  notation  like  Morphc(A,  B) 
to  distinguish  between  the  categories. 

We  are  to  think  initially  of  the  objects  as  the  sets  we  are  studying  with  a  par¬ 
ticular  kind  of  structure  on  them;  the  morphisms  are  then  the  functions  from  one 
object  to  another  that  respect  this  additional  structure,  and  the  law  of  composition 
is  just  composition  of  functions.  Indeed,  the  defining  conditions  that  are  imposed 
on  general  categories  are  arranged  to  be  obvious  for  this  special  kind  of  category, 
and  this  setting  accounts  for  the  order  in  which  we  write  the  composition  of  two 
morphisms.  But  the  definition  of  a  general  category  is  not  so  restrictive,  and  it  is 
important  not  to  restrict  the  definition  in  this  way. 

The  properties  that  are  to  be  satisfied  to  have  a  category  are  as  follows: 

(i)  the  sets  Morph(Ai,  B\)  and  Morph(A2,  Bn)  are  disjoint  unless  A\  = 
A2  and  B\  =  Bn  (because  two  functions  are  declared  to  be  different 
unless  their  domains  match  and  their  ranges  match,  as  is  underscored  in 
Section  A1  of  the  appendix), 

(ii)  the  law  of  composition  satisfies  the  associativity  property  h  (gf)  =  ( hg)f 
for  f  e  Morph(A,  B ),  g  €  Morph ( B.  C),  and  h  e  Morph(C,  D), 

(iii)  for  each  object  A,  there  is  an  identity  morphism  I  4  inMorph(A,  A)  such 
that  f\A  =  /  and  1  Ag  =  g  for  /  e  Morph(A,  B)  andg  e  Morph(C,  A). 

A  subcategory  S  of  a  category  C  by  definition  is  a  category  with  Obj  (S )  C  Obj  (C ) 
and  Morphs(A,  B)  c  Morphc(A,  B )  whenever  A  and  B  are  in  Obj(<S),  and  it 
is  assumed  that  the  laws  of  composition  in  S  and  C  are  consistent  when  both  are 
defined. 

Here  are  several  examples  in  which  the  morphisms  are  functions  and  the  law 
of  composition  is  ordinary  composition  of  functions.  They  are  usually  identified 
in  practice  just  by  naming  their  objects,  since  the  morphisms  are  understood  to 
be  all  functions  from  one  object  to  another  respecting  the  additional  structure  on 
the  objects. 

Examples  of  categories. 

(1)  The  category  of  all  sets.  An  object  A  is  a  set,  and  a  morphism  in  the  set 
Morph(A,  B)  is  a  function  from  A  into  B. 

(2)  The  category  of  all  vector  spaces  over  a  field  ¥.  The  morphisms  are  linear 
maps. 

(3)  The  category  of  all  groups.  The  morphisms  are  group  homomorphisms. 
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(4)  The  category  of  all  abelian  groups.  The  morphisms  again  are  group 
homomorphisms.  This  is  a  subcategory  of  the  previous  example. 

(5)  The  category  of  all  rings.  The  morphisms  are  all  ring  homomorphisms. 
The  kernel  and  the  image  of  a  morphism  are  necessarily  objects  of  the  category. 

(6)  The  category  of  all  rings  with  identity.  The  morphisms  are  all  ring  homo¬ 
morphisms  carrying  identity  to  identity.  This  is  a  subcategory  of  the  previous 
example.  The  image  of  a  morphism  is  necessarily  an  object  of  the  category,  but 
the  kernel  of  a  morphism  is  usually  not  in  the  category. 

(7)  The  category  of  all  fields.  The  morphisms  are  as  in  Example  6,  and  the 
result  is  a  subcategory  of  Example  6.  In  this  case  any  morphism  is  necessarily 
one-one  and  carries  inverses  to  inverses. 

(8)  The  category  of  all  group  actions  by  a  particular  group  G.  If  G  acts  on  X 
and  on  Y,  then  a  morphism  from  the  one  space  to  the  other  is  a  G  equivariant 
mapping  from  X  to  Y,  i.e.,  a  function  cp  :  X  — »■  Y  such  that  tp(gx)  =  g(p(x )  for 
all  x  in  X. 

(9)  The  category  of  all  representations  by  a  particular  group  G  on  a  vector  space 
over  a  particular  field  IF.  The  morphisms  are  the  linear  G  equivariant  functions. 
This  is  a  subcategory  of  the  previous  example. 

Readers  who  are  familiar  with  point-set  topology  will  recognize  that  one  can 
impose  topologies  on  everything  in  the  above  examples,  insisting  that  the  func¬ 
tions  be  continuous,  and  again  we  obtain  examples  of  categories.  For  example  the 
category  of  all  topological  spaces  consists  of  objects  that  are  topological  spaces 
and  morphisms  that  are  continuous  functions.  The  category  of  all  continuous 
group  actions  by  a  particular  topological  group  has  objects  that  are  group  actions 
G  x  X  — »■  X  that  are  continuous  functions,  and  the  morphisms  are  the  equivariant 
functions  that  are  continuous. 

Readers  who  are  familiar  with  manifolds  will  recognize  that  another  example 
is  the  category  of  all  smooth  manifolds,  which  consists  of  objects  that  are  smooth 
manifolds  and  morphisms  that  are  smooth  functions. 

The  morphisms  in  a  category  need  not  be  functions  in  the  usual  sense.  An 
important  example  is  the  “opposite  category”  C  opp  to  a  category  C,  which  is  a 
handy  technical  device  and  is  discussed  in  Problems  78-80  at  the  end  of  the 
chapter. 

In  all  of  the  above  examples  of  categories,  the  class  of  objects  fails  to  be  a  set. 
This  behavior  is  typical.  However,  it  does  not  cause  problems  in  practice  because 
in  any  particular  argument  involving  categories,  we  can  restrict  to  a  subcategory 
for  which  the  objects  do  form  a  set.17 

l7For  the  interested  reader,  a  book  that  pays  closer  attention  to  the  inherent  set-theoretic  difficul¬ 
ties  in  the  theory  is  Mac  Lane’s  Categories  for  the  Working  Mathematician. 
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IfC  is  a  category,  a  morphism  e  Morph(A,  B)  is  said  to  be  an  isomorphism  if 
there  exists  a  morphism  \!/  £  Morph ( B.  A)  such  that  xfrtp  =  l  a  and  i -pxfr  =  I  H.  In 
this  case  we  say  that  A  is  isomorphic  to  B  in  the  category  C.  Let  us  check  that  the 
morphism  i/r  is  unique  if  it  exists.  In  fact,  if  i//  is  a  member  of  Morph  ( B,  A)  with 
i jr'tp  =  1a  and  ipx//'  =  lg,  then  i/r  =  1  a^A  =  (V^V)1^  =  Vf,((PVf)  =  ' ■ 

We  can  therefore  call  i fr  the  inverse  to  <p. 

The  relation  “is  isomorphic  to”  is  an  equivalence  relation. 1 8  In  fact,  the  relation 
is  symmetric  by  definition,  and  it  is  reflexive  because  1a  £  Morph  (A ,  A)  has  1a 
as  inverse.  For  transitivity  let  q>\  £  Morph(  A ,  B)  and  <p2  £  Morph ( B,  C )  be  iso¬ 
morphisms,  with  respective  inverses  i/q  £  Morph  (5,  A)  and  1//2  £  Morph(C,  B). 
Then  <p2<Pi  is  in  Morph(A,  C),  and  ^1^2  is  in  Morph(C,  A).  Calculation  gives 
(VoV^Hwi)  =  VMVd  (<P2<pi))  =  =  VoCWi)  =  =  U, 

and  similarly  ^2)  =  lc-  Therefore  <p2<Pi  £  Morph(A,  C)  is  an  isomor¬ 

phism,  and  “is  isomorphic  to”  is  an  equivalence  relation.  When  A  is  isomorphic 
to  B,  it  is  permissible  to  say  that  A  and  B  are  isomorphic. 

The  next  step  is  to  abstract  a  frequent  kind  of  construction  that  we  have 
used  with  our  categories.  If  C  and  V  are  two  categories,  a  covariant  functor 
F  :  C  — »■  T>  associates  to  each  object  A  in  Obj(C )  an  object  F  (A)  in  Obj  CD)  and 
to  each  pair  of  objects  A  and  B  and  morphism  /  in  Morphc(A,  B)  a  morphism 
F(f )  in  MorphP(F(A),  F(B))  such  that 

(i)  F(gf)  =  F(g)F{f)  for  /  e  Morphc(A,  B)  and  g  e  Morphc(fi,  C), 

(ii)  F(\a)  =  1  F(A)  for  A  in  Obj(C). 

Examples  of  covariant  functors. 

(1)  Inclusion  of  a  subcategory  into  a  category  is  a  covariant  functor. 

(2)  Let  C  be  the  category  of  all  sets.  If  F  carries  each  set  X  to  the  set  2X  of 
all  subsets  of  X,  then  F  is  a  covariant  functor  as  soon  as  its  effect  on  functions 
between  sets,  i.e.,  its  effect  on  morphisms,  is  defined  in  an  appropriate  way. 
Namely,  if  /  :  X  — >■  Y  is  a  function,  then  F  ( f )  is  to  be  a  function  from 
F(X)  =  2X  to  F(Y)  =  2Y .  That  is,  we  need  a  definition  of  F(/)(A)  as  a  subset 
of  Y  whenever  A  is  a  subset  of  X.  A  natural  way  of  making  such  a  definition  is 
to  put  F(f)(A)  =  /(A),  and  then  F  is  indeed  a  covariant  functor. 

(3)  Let  C  be  any  of  Examples  2  through  6  of  categories  above,  and  let  D  be 
the  category  of  all  sets,  as  in  Example  1  of  categories.  If  F  carries  an  object  A  in 
C  (i.e.,  a  vector  space,  group,  ring,  etc.)  into  its  underlying  set  and  carries  each 
morphism  into  its  underlying  function  between  two  sets,  then  F  is  a  covariant 
functor  and  furnishes  an  example  of  what  is  called  a  forgetful  functor. 

^Technically  one  considers  relations  only  when  they  are  defined  on  sets,  and  the  class  of  objects 
in  a  category  is  typically  not  a  set.  However,  just  as  with  vector  spaces,  groups,  and  so  on,  we  can 
restrict  attention  in  any  particular  situation  to  a  subcategory  for  which  the  objects  do  form  a  set,  and 
then  there  is  no  difficulty. 
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(4)  Let  C  be  the  category  of  all  vector  spaces  over  a  field  F,  let  U  be  a 
vector  space  over  F,  and  let  F  :  C  — >  C  be  defined  on  a  vector  space  to 
be  the  vector  space  of  linear  maps  F(V)  =  HormfL,  V).  The  set  of  mor- 
phisms  Morphc(  V\ ,  V2)  is  Hom>(  V| ,  V2).  If  /  is  in  Morphc(Vj ,  V2),  then  F(f) 
is  to  be  in  Morphc  (Hompft/,  Vj),  HonrAL/.  V2)),  and  the  definition  is  that 
F(f)(L)  =  f  o  L  for  L  e  Homp(t/,  Vj).  Then  F  is  a  covariant  functor: 
to  check  that  F(gf )  =  F(g)F{f )  when  g  is  in  Morphc(V2,  V3),  we  write 
F(gf)(L)  =  gf  oL  =  gofL  =  go  F(f )  =  F(g)F(f). 

(5)  Let  C  be  the  category  of  all  groups,  let  V  be  the  category  of  all  sets,  let  G 
be  a  group,  and  let  F  :  C  — »■  V  be  the  functor  defined  as  follows.  For  a  group 
H,  F(H )  is  the  set  of  all  group  homomorphisms  from  G  into  H.  The  set  of 
morphisms  Morphc(//] .  Hi)  is  the  set  of  group  homomorphisms  from  H\  into 
Hi-  If  /  is  in  Morphc(//] ,  //2),  then  F ( f )  is  to  be  a  function  with  domain  the  set 
of  homomorphisms  from  G  into  H\  and  with  range  the  set  of  homomorphisms 
from  G  into  Hi.  Let  F(  f  )((p)  =  <p  o  /'.  Then  F  is  a  covariant  functor. 

(6)  Let  C  be  the  category  of  all  sets,  and  let  V  be  the  category  of  all  abelian 
groups.  To  a  set  S,  associate  the  free  abelian  group  F{S)  with  S  as  Z  basis. 
If  /  :  S  — >  S'  is  a  function,  then  the  universal  mapping  property  of  external 
direct  sums  of  abelian  groups  (Proposition  4.17)  yields  a  corresponding  group 
homomorphism  from  F{S)  to  F ( S' ) ,  and  we  define  this  group  homomorphism 
to  be  F(f).  Then  F  is  a  covariant  functor. 

(7)  Let  C  be  the  category  of  all  finite  sets,  fix  a  commutative  ring  R  with 
identity,  and  let  V  be  the  category  of  all  commutative  rings  with  identity.  To 
a  finite  set  S,  associate  the  commutative  ring  F(S)  =  R [ ( Xs  \  s  £  5}].  If 
/  :  S  — >■  S'  is  a  function,  then  the  properties  of  substitution  homomorphisms 
give  us  a  corresponding  homomorphism  of  rings  with  identity  carrying  F(S)  to 
F (S'),  and  the  result  is  a  covariant  functor. 

There  is  a  second  kind  of  functor  of  interest  to  us.  If  C  and  T>  are  two  categories, 
a  contravariant  functor  F  :  C  — >  V  associates  to  each  object  A  in  Obj(C)  an 
object  F{A)  in  Obj(P)  and  to  each  pair  of  objects  A  and  B  and  morphism  /  in 
Morphc(  A,  B )  a  morphism  F(  f  )  in  Morphy (F(B),  F(A))  such  that 

(i)  F(gf)  =  F(f  )F(g)  for  /  e  Morphc(A,  B)  and  g  e  Morph V(B.  C), 

(ii)  F(  \  a)  =  I  ,:(A)  for  A  in  Obj(C). 

Examples  of  contravariant  functors. 

(1)  Let  C  be  the  category  of  all  vector  spaces  over  a  field  F,  let  W  be  a 
vector  space  over  F,  and  let  F  :  C  — »■  C  be  defined  on  a  vector  space  to  be 
the  vector  space  of  linear  maps  F(V )  =  HompfV,  W).  The  set  of  morphisms 
Morphc(Vi,  V2)  isHomF(Vi,  Vj).  If/isinMorphc(Vj,  V2),  then  F(f)  is  to  be  in 


194 


IV.  Groups  and  Group  Actions 


Morphc  (  Ho  nip  (Vo-  W),  HomF  V, .  W ) ) ,  and  the  definition  is  that  F(f)(L)  = 
L  o  /  for  L  e  HompCV?,  W).  Then  F  is  a  contravariant  functor:  to  check 
that  F(gf )  =  F(f)F(g)  when  g  is  in  Morphc(V2,  V3),  we  write  F(gf)(L)  = 
Logf  =  Lgof  =  F(f)(Lg )  =  F(f)F(g)(L). 

(2)  Let  C  be  the  category  of  all  vector  spaces  over  a  held  F,  define  F  of  a 
vector  space  V  to  be  the  dual  vector  space  V',  and  define  F  of  a  linear  mapping 
/  between  two  vector  spaces  V  and  W  to  be  the  contragredient  /'  carrying  W' 
into  V' ,  defined  by  f'(w')(v)  =  vr  if  (v)).  This  is  the  special  case  of  Example  1 
of  contravariant  functors  in  which  W  =¥.  Hence  F  is  a  contravariant  functor. 

(3)  Let  C  be  the  category  of  all  groups,  let  D  be  the  category  of  all  sets,  let  G 
be  a  group,  and  let  F  :  C  — >  D  be  the  functor  defined  as  follows.  For  a  group 
H,  F(H)  is  the  set  of  all  group  homomorphisms  from  H  into  G.  The  set  of 
morphisms  Morphc(//] .  HF)  is  the  set  of  group  homomorphisms  from  H\  into 
Ho-  If  /  is  in  Morphc(//] ,  HF),  then  F ( f )  is  to  be  a  function  with  domain  the  set 
of  homomorphisms  from  Fli  into  G  and  with  range  the  set  of  homomorphisms 
from  Hi  into  G.  The  definition  is  F(f)(cp)  —  <p  o  f .  Then  F  is  a  contravariant 
functor. 

It  is  an  important  observation  about  functors  that  the  composition  of  two 
functors  is  a  functor.  This  is  immediate  from  the  definition.  If  the  two  functors 
are  both  covariant  or  both  contravariant,  then  the  composition  is  covariant.  If 
one  of  them  is  covariant  and  the  other  is  contravariant,  then  the  composition  is 
contravariant. 


A 


B 


C 


-»  D 


FIGURE  4.9.  A  square  diagram.  The  square  commutes  if  yet  =  8/3. 


In  the  subject  of  category  theory,  a  great  deal  of  information  is  conveyed  by 
“commutative  diagrams”  of  objects  and  morphisms.  By  a  diagram  is  meant  a 
directed  graph,  usually  but  not  necessarily  planar,  in  which  the  vertices  represent 
some  relevant  objects  in  a  category  and  the  arrows  from  one  vertex  to  another 
represent  morphisms  of  interest  between  pairs  of  these  objects.  Often  the  vertices 
and  arrows  are  labeled,  but  in  fact  labels  on  the  vertices  can  be  deduced  from  the 
labels  on  the  arrows  since  any  morphism  determines  its  “domain"  and  “range” 
as  a  consequence  of  defining  property  (i)  of  categories.  A  diagram  is  said  to  be 
commutative  if  for  each  pair  of  vertices  A  and  B  and  each  directed  path  from 
A  to  fi,  the  compositions  of  the  morphisms  along  each  path  are  the  same.  For 
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example  a  square  as  in  Figure  4.9  is  commutative  if  ya  =  8/3.  The  triangular 
diagrams  in  Figures  4.1  through  4.8  are  all  commutative. 


F(A)  F(B) 


F  (C) 


->  F(D ) 


and 


G(A)  4 

Gi.fi) 


G(  a) 


G(C )  <- 


G(B) 

G(y) 

G(D) 


F(8)  G(8 ) 

FIGURE  4.10.  Diagrams  obtained  by  applying  a  covariant  functor  F 
and  a  contravariant  functor  G  to  the  diagram  in  Figure  4.9. 


Functors  can  be  applied  to  diagrams,  yielding  new  diagrams.  For  example, 
suppose  that  Figure  4.9  is  a  diagram  in  the  category  C,  that  F  :  C  — »■  V  is  a 
covariant  functor,  and  that  G  :  C  — »■  V  is  a  contravariant  functor.  Then  we 
can  apply  F  and  G  to  the  diagram  in  Figure  4.9,  obtaining  the  two  diagrams  in 
the  category  D  that  are  pictured  in  Figure  4.10.  If  the  diagram  in  Figure  4.9  is 
commutative,  then  so  are  the  diagrams  in  Figure  4.10,  as  a  consequence  of  the 
effect  of  functors  on  compositions  of  morphisms. 

The  subject  of  category  theory  seeks  to  analyze  functors  that  make  sense  for 
all  categories,  or  at  least  all  categories  satisfying  some  additional  properties. 
The  most  important  investigation  of  this  kind  is  concerned  with  homology  and 
cohomology,  as  well  as  their  ramifications,  for  “abelian  categories,”  which  include 
several  important  examples  affecting  algebra,  topology,  and  several  complex 
variables.  The  topic  in  question  is  called  “homological  algebra”  and  is  discussed 
further  in  Advanced  Algebra,  particularly  in  Chapter  IV. 

There  are  a  number  of  other  functors  that  are  investigated  in  category  theory, 
and  we  mention  four: 

•  products,  including  direct  products, 

•  coproducts,  including  direct  sums, 

•  direct  limits,  also  called  inductive  limits, 

•  inverse  limits,  also  called  projective  limits. 

We  discuss  general  products  and  coproducts  in  the  present  section,  omitting  a 
general  discussion  of  direct  limits  and  inverse  limits.  Inverse  limits  will  arise  in 
Section  VII. 6  of  Advanced  Algebra  for  one  category  in  connection  with  Galois 
groups,  but  we  shall  handle  that  one  situation  on  its  own  without  attempting  a 
generalization.  An  attempt  in  the  1960s  to  recast  as  much  mathematics  as  possible 
in  terms  of  category  theory  is  now  regarded  by  many  mathematicians  as  having 
been  overdone,  and  it  seems  wiser  to  cast  bodies  of  mathematics  in  the  framework 
of  category  theory  only  when  doing  so  can  be  justified  by  the  amount  of  time  saved 
by  eliminating  redundant  arguments. 
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When  a  category  C  and  a  nonempty  set  5  are  given,  we  can  define  a  category 
Cs.  The  objects  of  Cs  are  functions  on  5  with  the  property  that  the  value  of  the 
function  at  each  s  in  5  is  in  Obj(C),  two  such  functions  being  regarded  as  the 
same  if  they  consist  of  the  same  ordered  pairs.19  Let  us  refer  to  such  a  function 
as  an  5-tuple  of  members  of  Obj(C),  denoting  it  by  an  expression  like  jWv  }  ves- 
A  morphism  in  Morphy  ({ Whs. s-  (Whes)  is  an  5-tuple  {fs}ses  of  morphisms 
of  C  such  that  fs  lies  in  Morphc(W>  Ys)  for  all  s,  and  the  law  of  composition  of 
such  morphisms  takes  place  coordinate  by  coordinate. 

Let  {WheS  be  an  object  in  Cs .  A  product  of  { W  hs.s  is  a  pair  (X.  {ps}sss) 
such  that  X  is  in  Obj(C)  and  each  ps  is  in  Morphc(X,  Xs)  with  the  following 
universal  mapping  property:  whenever  A  in  Obj  (C )  is  given  and  a  morphism 
tps  e  Morphc(A,  X s)  is  given  for  each  s,  then  there  exists  a  unique  morphism 
<p  £  Morph(T/L  X  )  such  that  pscp  =  cps  for  all  s.  The  relevant  diagram  is  pictured 
in  Figure  4.11. 


-**■  Vs  A 

Xs  ^ — —  A 

X  <p 


FIGURE  4.11.  Universal  mapping  property  of  a  product  in  a  category. 


Examples  of  products. 

(1)  Products  exist  in  the  category  of  vector  spaces  over  a  field  F.  If  vector 
spaces  Vs  indexed  by  a  nonempty  set  5  are  given,  then  their  product  exists  in  the 
category,  and  an  example  is  their  external  direct  product  n«es  according  to 
Figure  2.4  and  the  discussion  around  it. 

(2)  Products  exist  in  the  category  of  all  groups.  If  groups  Gs  indexed  by  a 

nonempty  set  5  are  given,  then  their  product  exists  in  the  category,  and  an  example 
is  their  external  direct  product  according  to  Figure  4.2  and  Proposition 

4.15.  If  the  groups  Gs  are  abelian,  then  J~[JgS  Gs  is  abelian,  and  it  follows  that 
products  exist  in  the  category  of  all  abelian  groups. 

(3)  Products  exist  in  the  category  of  all  sets.  If  sets  Xs  indexed  by  a  nonempty 
set  5  are  given,  then  their  product  exists  in  the  category,  and  an  example  is  their 
Cartesian  product  X  s€SWs,  as  one  easily  checks. 

(4)  Products  exist  in  the  category  of  all  rings  and  in  the  category  of  all  rings  with 
identity.  If  objects  Rs  in  the  category  indexed  by  a  nonempty  set  5  are  given,  then 

19In  other  words,  the  range  of  such  a  function  is  considered  as  irrelevant.  We  might  think  of  the 
range  as  Obj(C)  except  for  the  fact  that  a  function  is  supposed  to  have  a  set  as  range  and  Obj(C)  need 
not  be  a  set. 
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their  product  may  be  taken  as  an  abelian  group  to  be  the  external  direct  product 
]~[v(  S  Rs,  with  multiplication  defined  coordinate  by  coordinate,  and  the  group 
homomorphisms  ps  are  easily  checked  to  be  morphisms  in  the  category. 

A  product  of  objects  in  a  category  need  not  exist  in  the  category.  An  artificial 
example  may  be  formed  as  follows:  Let  C  be  a  category  with  one  object  G,  namely 
a  group  of  order  2,  and  let  Morph(G,  G)  =  {0.  1g},  the  law  of  composition  being 
the  usual  composition.  Let  5  be  a  2-element  set,  and  let  the  corresponding  objects 
be  X]  =  G  and  X2  =  G.  The  claim  is  that  the  product  X\  xL  does  not  exist  in  C. 
In  fact,  take  A  =  G.  There  are  four  5-tuples  of  morphisms  (<pi,  (p2)  meeting  the 
conditions  of  the  definition.  Yet  the  only  possibility  for  the  product  is  X  =  G,  and 
then  there  are  only  two  possible  <p’s  in  Morph  IT.  X).  Hence  we  cannot  account 
for  all  possible  5-tuples  of  morphisms,  and  the  product  cannot  exist. 

The  thing  that  category  theory  addresses  is  the  uniqueness.  A  product  is 
always  unique  up  to  canonical  isomorphism,  according  to  Proposition  4.63.  We 
proved  uniqueness  for  products  in  the  special  cases  of  Examples  1  and  2  above 
in  Propositions  2.32  and  4.16. 

Proposition  4.63.  Let  C  be  a  category,  and  let  5  be  a  nonempty  set.  If  {XjJses 
is  an  object  in  Cs  and  if  (X,  ( ps } )  and  (X',  { /;' } )  are  two  products,  then  there 
exists  a  unique  morphism  O  :  X'  — >■  X  such  that  p's  =  ps  o  O  for  all  s  e  S,  and 
O  is  an  isomorphism. 

Remark.  There  is  no  assertion  that  ps  is  onto  Xs.  In  fact,  “onto”  has  no 
meaning  for  a  general  category. 

Proof.  In  Figure  4.11  let  A  =  X'  and  <ps  =  /;' .  If  O  e  MorphfX',  X) 
is  the  morphism  produced  by  the  fact  that  X  is  a  direct  product,  then  we  have 
ps  <f>  =  p's  for  all  s.  Reversing  the  roles  of  X  and  X',  we  obtain  a  morphism 
O'  e  MorphfX.  X')  with  p\ O'  =  ps  for  alii.  Therefore  ps(00')  =  ( /;v0)0'  = 
Ps&  =  Ps- 

In  Figure  4.11  we  next  let  A  =  X  and  < ps  =  ps  for  all  s.  Then  the  identity  lx 
in  MorphfX,  X)  has  the  same  property  ps  1  x  =  Ps  relative  to  all  ps  that  OO'  has, 
and  the  uniqueness  in  the  statement  of  the  universal  mapping  property  implies  that 
OO'  =  lX-  Reversing  the  roles  of  X  and  X',  we  obtain  O'O  =  I  x'-  Therefore 
O  is  an  isomorphism. 

For  uniqueness  suppose  that  O  e  MorphfX',  X)  is  another  morphism  with 
p's  =  ps  O  for  all  s  e  5.  Then  the  argument  of  the  previous  paragraph  shows  that 
O'O  =  lX’.  Consequently  O  =  1^0  =  (OO')O  =  O(O'O)  =  Ol^'  =  O,  and 
0  =  0.  □ 

If  products  always  exist  in  a  particular  category,  they  are  not  unique,  only 
unique  up  to  canonical  isomorphism.  Such  a  product  is  commonly  denoted  by 
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f[vGS  Xs,  even  though  it  is  not  uniquely  dehned.  It  is  customary  to  treat  the 
product  over  S  as  a  covariant  functor  F  :  Cs  —>  C,  the  effect  of  the  functor  on 
objects  being  given  by  F({Xs}seS)  =  n  seS  Xs.  For  a  well-defined  functor  we 
have  to  fix  a  choice  of  product  for  each  object  under  consideration20  in  Obj(Cs). 
For  the  effect  of  F  on  morphisms,  we  argue  with  the  universal  mapping  property. 
Thus  let  {X.sLgs  and  {Tv}.vss  be  objects  in  Cs ,  let  fs  be  in  Morpl\TA's,  Ys)  for  all 
j,  and  let  the  products  in  question  be  ( ff€S  xs,  Ihvhs.s)  and  ( ffeS  Ys>  f/vhe.s)- 
Then  fSopSo  is  in  Morphy  (  EIssS  Xs,  Ys«)  f°r  each  S(h  an<3  the  universal  mapping 
property  gives  us  /  in  Morphc  (  ffsS  Xs,  fives  Ys)  such  that  d.J  =  fsPs  for  all 
s.  We  define  this  /  to  be  F({fs}ses),  and  we  readily  check  that  F  is  a  functor. 

We  turn  to  coproducts,  which  include  direct  sums.  Let  be  an  object  in 

Cs .  A  coproduct  of  {Av}ves  is  a  pair  (X.  {4fss)  such  that  X  is  in  Obj(C)  and 
each  4  is  in  Morphc(Xs,  X)  with  the  following  universal  mapping  property: 
whenever  A  in  Obj(C)  is  given  and  a  morphism  <ps  e  Morphc(Xs,  A)  is  given 
for  each  5,  then  there  exists  a  unique  morphism  <p  e  Morphc(X,  A)  such  that 
(pi\  =  (ps  for  all  5.  The  relevant  diagram  is  pictured  in  Figure  4.12. 

Xs  A 

a 

h  x  v 

x 

FIGURE  4.12.  Universal  mapping  property  of  a  coproduct  in  a  category. 
Examples  of  coproducts. 

(1)  Coproducts  exist  in  the  category  of  vector  spaces  over  a  held  F.  If  vector 
spaces  Vs  indexed  by  a  nonempty  set  S  are  given,  then  their  coproduct  exists  in 
the  category,  and  an  example  is  their  external  direct  sum  ®seS  Vs,  according  to 
Figure  2.5  and  the  discussion  around  it. 

(2)  Coproducts  exist  in  the  category  of  all  abelian  groups.  If  abelian  groups  Gs 
indexed  by  a  nonempty  set  S  are  given,  then  their  coproduct  exists  in  the  category, 
and  an  example  is  their  external  direct  sum  ®  s  Gs,  according  to  Figure  4.4  and 
Proposition  4.17. 

(3)  Coproducts  exist  in  the  category  of  all  sets.  If  sets  Xs  indexed  by  a  nonempty 
set  S  are  given,  then  their  coproduct  exists  in  the  category,  and  an  example  is  their 
disjoint  union  [_Jse  V  {(x^,  s)  |  xs  e  Xs}.  The  verification  appears  as  Problem  74 
at  the  end  of  the  chapter. 

20Since  Obj(C's)  need  not  be  a  set,  it  is  best  to  be  wary  of  applying  the  Axiom  of  Choice  when 
the  indexing  of  sets  is  given  by  Obj(C's).  Instead,  one  makes  the  choice  only  for  all  objects  in  some 
set  of  objects  large  enough  for  a  particular  application. 
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(4)  Coproducts  exist  in  the  category  of  all  groups.  Suppose  that  groups  Gs 
indexed  by  a  nonempty  set  S  are  given.  It  will  be  shown  in  Chapter  VII  that 
the  coproduct  is  the  “free  product"  %sesGs  that  is  defined  in  that  chapter.  In  the 
special  case  that  each  Gs  is  the  group  7L  of  integers,  the  free  product  coincides 
with  the  free  group  on  S.  Therefore,  even  if  all  the  groups  Gs  are  abelian,  their 
coproduct  need  not  be  a  subgroup  of  the  direct  product  and  need  not  even  be 
abelian.  In  particular  it  need  not  coincide  with  the  direct  sum. 

A  coproduct  of  objects  in  a  category  need  not  exist  in  the  category.  Problem  76 
at  the  end  of  the  chapter  offers  an  example  that  the  reader  is  invited  to  check. 

Proposition  4.64.  Let  C  be  a  category,  and  let  5  be  a  nonempty  set.  If  { Av}s.gs 
is  an  object  in  Cs  and  if  (X,  {/.,})  and  ( X' ,  {/'})  are  two  coproducts,  then  there 
exists  a  unique  morphism  0  :  X  — >  X'  such  that  i'  =  0  o  is  for  all  .v  e  S,  and  0 
is  an  isomorphism. 

Remarks.  There  is  no  assertion  that  is  is  one-one.  In  fact,  “one-one”  has 
no  meaning  for  a  general  category.  This  proposition  may  be  derived  quickly 
from  Proposition  4.63  by  a  certain  duality  argument  that  is  discussed  in  Problems 
78-80  at  the  end  of  the  chapter.  Here  we  give  a  direct  argument  without  taking 
advantage  of  duality. 

Proof.  In  Figure  4.12  let  A  =  X'  and  <ps  =  i's.  If  <t>  e  Morph  (A,  X')  is  the 
morphism  produced  by  the  fact  that  A  is  a  coproduct,  then  we  have  0  is  =  i[  for 
all  s .  Reversing  the  roles  of  A  and  A',  we  obtain  a  morphism  O'  e  Morphf  A',  A) 
with  Ob'  =  is  for  all  s.  Therefore  (0'0)/.v  =  07'  =  is. 

In  Figure  4.12  we  next  let  A  =  A  and  <ps  =  is  for  all  s.  Then  the  identity  lx 
in  Morphf  A,  A)  has  the  same  property  I  xh  =  is  relative  to  all  is  that  0'0  has, 
and  the  uniqueness  says  that  0'0  =  lx-  Reversing  the  roles  of  A  and  A',  we 
obtain  00'  =  lx'-  Therefore  0  is  an  isomorphism. 

For  uniqueness  suppose  that  0  e  Morph(A,  A')  is  another  morphism  with 
i's  =  0  for  all  s  e  S.  Then  the  argument  of  the  previous  paragraph  shows  that 
0' 0  =  lx-  Consequently  0  =  lx-0  =  (00')0  =  0(0'0)  =  01x  =  0,  and 
0  =  0.  □ 

If  coproducts  always  exist  in  a  particular  category,  they  are  not  unique,  only 
unique  up  to  canonical  isomorphism.  Such  a  coproduct  is  commonly  denoted  by 
JJss5  Xs ,  even  though  it  is  not  uniquely  defined.  As  with  product,  it  is  customary 
to  treat  the  coproduct  over  S  as  a  covariant  functor  F  :  Cs  C,  the  effect  of  the 
functor  on  objects  being  given  by  -F({A?}sg,s)  =  Usc,s  %s.  For  a  well-defined 
functor  we  have  to  fix  a  choice  of  coproduct  for  each  object  under  consideration 
in  Obj(C5).  For  the  effect  of  F  on  morphisms,  we  argue  with  the  universal 
mapping  property.  Thus  let  { As }  v e  s  and  j  Tv  [  ve  s  be  objects  in  Cs ,  let  fs  be  in 
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Morphc(Xs,  Fv)  for  all  s,  and  let  the  coproducts  in  question  be  ( jj  s  g  s  Xs,  {/\  }  ve,s) 
and  ( Uses  K'  .  !  ./s  I  ve.s)  •  Then  jSofSo  is  in  Morphc  (XSo,  [J  V6  S  Ys)  for  each  s0,  and 
the  universal  mapping  property  gives  us  /  in  Morphy  ( ]_JvgS  Xs,  ]JsgS  such 
that  fis  =  jsfs  for  all  s.  We  define  this  f  to  be  F({fs}ses)>  and  we  readily  check 
that  F  is  a  functor. 

Universal  mapping  properties  occur  in  other  contexts  than  for  products  and 
coproducts.  We  have  already  seen  them  in  connection  with  homomorphisms  on 
free  abelian  groups  and  with  substitution  homomorphisms  on  polynomial  rings, 
and  more  such  properties  will  occur  in  the  development  of  tensor  products  in 
Chapter  VI.  A  general  framework  for  discussing  universal  mapping  properties 
appears  in  the  problems  at  the  end  of  Chapter  VI. 


12.  Problems 

1 .  Let  G  be  a  group  in  which  all  elements  other  than  the  identity  have  order  2.  Prove 
that  G  is  abelian. 

2.  The  dihedral  group  /J4  of  order  8  can  be  viewed  as  a  subgroup  of  the  symmetric 
group  64  of  order  8.  Find  8  explicit  permutations  in  64  forming  a  subgroup 
isomorphic  to  D4. 

2 A.  Let  g  be  an  element  of  finite  order  ord(g)  in  a  group  G.  Prove  that 

(a)  g-1  has  the  same  order  as  g. 

(b)  gk  —  1  if  and  only  if  ord(g)  divides  k. 

(c)  for  each  r  e  Z,  the  order  of  g1  is  ord(g)/GCD(ord(g),  r). 

3.  Suppose  G  is  a  finite  group,  H  is  a  subgroup,  and  a  e  G  is  an  element  with  a1 
in  H  for  some  integer  l  with  GCD(/,  |G|)  =  1.  Prove  that  a  is  in  H. 

4.  Let  G  be  a  group,  and  define  a  new  group  G'  to  have  the  same  underlying  set  as 
G  but  to  have  multiplication  given  by  a  o  b  —  ba.  Prove  that  G'  is  a  group  and 
that  it  is  isomorphic  to  G . 

5.  Prove  that  if  G  is  an  abelian  group  and  n  is  an  integer,  then  a  1— >  a"  is  a 
homomorphism  of  G.  Give  an  example  of  a  nonabelian  group  for  which  a  h*  a2 
is  not  a  homomorphism. 

6.  Suppose  that  G  is  a  group  and  that  H  and  K  are  normal  subgroups  of  G  with 
H  IT  K  —  { 1 }.  Verify  that  the  set  HK  of  products  is  a  subgroup  and  that  this 
subgroup  is  isomorphic  as  a  group  to  the  external  direct  product  H  x  K. 

7.  Take  as  known  that  8191  is  prime,  so  that  is  a  field.  Without  carrying 
through  the  computations  and  without  advocating  trial  and  error,  describe  what 
steps  you  would  carry  out  to  solve  for  x  mod  8191  such  that  1 234.V  =  1  mod 
8191. 
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8.  (Wilson’s  Theorem)  Let  p  be  an  odd  prime.  Starting  from  the  fact  that 

1,  1  are  roots  of  the  polynomial  Xp~l  —  1  =  0  mod  p  in  Fp,  prove 

that  (p  —  1)!  =  —  1  mod  p. 

9.  Classify,  up  to  isomorphism,  all  groups  of  order  p2  if  p  is  a  prime. 

10.  This  problem  concerns  conjugacy  classes  in  a  group  G. 

(a)  Prove  that  all  elements  of  a  conjugacy  class  have  the  same  order. 

(b)  Prove  that  if  ab  is  in  a  conjugacy  class,  so  is  ba. 

11.  (a)  Find  explicitly  all  the  conjugacy  classes  in  the  alternating  group  24- 

(b)  For  each  conjugacy  class  in  24,  find  the  centralizer  of  one  element  in  the 
class. 

(c)  Prove  that  24  has  no  subgroup  isomorphic  to  C 6  or  @3. 

12.  Prove  that  the  alternating  group  24  has  no  subgroup  of  order  30. 

13.  Let  G  be  a  nonabelian  group  of  order  p",  where  p  is  prime.  Prove  that  any 
subgroup  of  order  is  normal. 

14.  Let  G  be  a  finite  group,  and  let  H  be  a  normal  subgroup.  If  \H\  —  p  and  p  is 
the  smallest  prime  dividing  |G|,  prove  that  H  is  contained  in  the  center  of  G. 

15.  Let  G  be  a  group.  An  automorphism  of  G  of  the  form  x  t->  gxg~ 1  is  called  an 
inner  automorphism.  Prove  that  the  set  of  inner  automorphisms  is  a  normal 
subgroup  of  the  group  Aut  G  of  all  automorphisms  and  is  isomorphic  to  G/Zq. 

16.  (a)  Prove  that  Aut  Cm  is  isomorphic  to  (Z/ mZ) x . 

(b)  Find  a  value  of  m  for  which  Aut  C,„  is  not  cyclic. 

17.  Fix  n  >  2.  In  the  symmetric  group  G„,  for  each  integer  k  with  1  <  k  <  n/ 2,  let 
Ck  be  the  set  of  elements  in  that  are  products  of  k  disjoint  transpositions. 

(a)  Prove  that  if  r  is  an  automorphism  of  6„,  then  r(Ci)  =  Ck  for  some  k. 

/  n  \  (2k) ! 

(b)  Prove  that 

(c)  Prove  that  \Ck  \  ^  |Ci|  unless  k  =  1  or«  =  6.  (Educational  note:  From  this, 
it  follows  that  r(Cj)  =  C 1  except  possibly  when  n  —  6.  One  can  deduce 
as  a  consequence  that  every  automorphism  of  &„  is  inner  except  possibly 
when  n  =  6.) 

18.  Give  an  example:  G  is  a  group  with  a  normal  subgroup  N ,  N  has  a  subgroup  M 
that  is  normal  in  N ,  yet  M  is  not  normal  in  G. 

19.  Show  that  the  cyclic  group  Crs  is  isomorphic  to  Cr  x  Cs  if  and  only  if 
GCD(r,  5)  =  1. 

20.  How  many  abelian  groups,  up  to  isomorphism,  are  there  of  order  27? 
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21. 


22. 


23. 


24. 

25. 

26. 

27. 


28. 

29. 

30. 


Let  G  be  the  free  abelian  group  with  Z  basis  {xi ,  X2,  X3}.  Let  H  be  the  subgroup 
of  G  generated  by  {u\,  M2,  M3},  where 

mi  =  3xi  +  2X2  +  5X3, 

M2  =  X2  +  3X3, 

M3  =  X2  +  5X3- 

Express  G/H  as  a  direct  sum  of  cyclic  groups. 

Let  {ei,  e2,  £3,  e 4 }  be  the  standard  basis  of  M4.  Let  G  be  the  additive  subgroup 
of  R4  generated  by  the  four  elements 

ei,  ei+e2,  \(e\  +  62  +  ei  +  64),  \{e\  +  ei  +  63  -  64), 

and  let  H  be  the  subgroup  of  G  generated  by  the  four  elements 


e\  —  62,  62  —  63,  63  —  64,  63  +  64. 

Identify  the  abelian  group  G/H  as  a  direct  sum  of  cyclic  groups. 

Let  G  be  the  free  abelian  group  with  Z  basis  {xi, . . . ,  x„},  and  let  H  be  the 


subgroup  generated  by  { m  1 . . . ,  um],  where 


Hi 


for  an  m-by-n 


\  um  /  \  xn  7 

matrix  C  of  integers.  Prove  that  the  number  of  summands  Z  in  the  decomposition 
of  G/H  into  cyclic  groups  is  equal  to  the  rank  of  the  matrix  C  when  C  is 
considered  as  in  Mmn(Q). 


Prove  that  every  abelian  group  is  the  homomorphic  image  of  a  free  abelian  group. 
Let  G  be  a  group,  and  let  H  and  K  be  subgroups. 

(a)  For  x  and  y  in  G,  prove  that  x  H  fl  y  K  is  empty  or  is  a  coset  of  II  C\  K . 

(b)  Deduce  from  (a)  that  if  H  and  K  have  finite  index  in  G,  then  so  does  H  P\K. 


Let  G  be  a  free  abelian  group  of  finite  rank  n ,  and  let  H  be  a  free  abelian  subgroup 
of  rank  n.  Prove  that  H  has  finite  index  in  G. 


Let  G  —  ©4  be  the  symmetric  group  on  four  letters. 

(a)  Find  a  Sylow  2-subgroup  of  G.  How  many  Sylow  2-subgroups  are  there, 
and  why? 

(b)  Find  a  Sylow  3-subgroup  of  G.  How  many  Sylow  3-subgroups  are  there, 
and  why? 

Let  H  be  a  subgroup  of  a  group  G.  Prove  or  disprove  that  the  normalizer  N(H) 
of  H  in  G  is  a  normal  subgroup  of  G. 

How  many  elements  of  order  7  are  there  in  a  simple  group  of  order  168? 

Let  G  be  a  group  of  order  pq 2,  where  p  and  q  are  primes  with  p  <  q.  Let  Sp 
and  Sq  be  Sylow  subgroups  for  the  primes  p  and  q.  Prove  that  G  is  a  semidirect 
product  of  Sp  and  Sq  with  Sq  normal. 
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3 1 .  Suppose  that  G  is  a  finite  group  and  that  H  is  a  subgroup  whose  index  in  G  is 
a  prime  p.  By  considering  the  action  of  G  on  the  set  of  subgroups  conjugate 
to  H  and  considering  the  possibilities  for  the  normalizer  N(H),  determine  the 
possibilities  for  the  number  of  subgroups  conjugate  to  H. 

32.  Let  G  be  a  group  of  order  24,  let  H  be  a  subgroup  of  order  8,  and  assume  that  H 
is  not  normal. 

(a)  Using  the  Sylow  Theorems,  explain  why  H  has  exactly  3  conjugates  in  G, 
counting  H  itself  as  one. 

(b)  Show  how  to  use  the  conjugates  in  (a)  to  define  a  homomorphism  of  G  into 
the  symmetric  group  G3  on  three  letters. 

(c)  Use  the  homomorphism  of  (b)  to  conclude  that  G  is  not  simple. 

33.  Let  G  be  a  group  of  order  36.  Arguing  in  the  style  of  the  previous  problem,  show 
that  there  is  a  nontrivial  homomorphism  of  G  into  the  symmetric  group  64. 

34.  Let  G  be  a  group  of  order  2 pq,  where  p  and  q  are  primes  with  2  <  p  <  q. 

(a)  Prove  that  if  q  +  1  ^  2p,  then  a  Sylow  q -subgroup  is  normal. 

(b)  Suppose  that  q  +  1  =  2p,  let  H  be  a  Sylow  /^-subgroup,  and  let  K  be  a 
Sylow  g -subgroup.  Prove  that  at  least  one  of  H  and  K  is  normal,  that  the 
set  HK  of  products  is  a  subgroup,  and  that  the  subgroup  HK  is  cyclic  of 
index  2  in  G. 

Problems  35-38  concern  the  detection  of  isomorphisms  among  semidirect  products. 
For  the  first  two  of  the  problems,  let  H  and  K  be  groups,  and  let  cp\  :  H  — »■  Aut  K 
and  q>2  :  H  Aut  K  be  homomorphisms. 

35.  Suppose  that  ^2  =  <Pioq>  for  some  automorphism^  of  H.  Define  if/  :  HxVlK  — »■ 

H  xw  K  by  1 jr(h,  k)  =  k).  Prove  that  1 jr  is  an  isomorphism. 

36.  Suppose  that  q>2  —  <p  o  <pi  for  some  inner  automorphism  <p  of  Aut  K  in  the  sense 
of  Problem  15,  i.e.,  cp  :  Aut  K  — »■  Aut  K  is  to  be  given  by  q> (x )  —  axcC 1  with  a 
in  Aut  K.  Define  1 jr  :  H  xVl  K  —>■  H  x^,  K  by  x[r (h,  k)  —  (/;,  a(k)).  Prove  that 
\J/  is  an  isomorphism. 

37.  Suppose  that  p  and  q  are  primes  and  that  the  cyclic  group  Cp  acts  on  Cq  by 
automorphisms  with  a  nontrivial  action.  Prove  that  p  divides  q  —  1. 

38.  Suppose  that  p  and  q  are  primes  such  that  p  divides  q  —  1.  Let  x\  and  t2 
be  nontrivial  homomorphisms  from  Cp  to  A  lit  C,r  Prove  that  Cp  xTl  Ccj  = 
Cp  x  t2  Cq ,  and  conclude  that  there  is  only  one  nonabelian  semidirect  product 
Cp  xT  Cq  up  to  isomorphism. 

Problems  39-44  discuss  properties  of  groups  of  order  8,  obtaining  a  classification  of 
these  groups  up  to  isomorphism. 

39.  Prove  that  the  five  groups  C’x ,  C4  x  C2,  C2  x  C2  x  C2,  D4,  and  //x  are  mutually 
nonisomorphic  and  that  the  first  three  exhaust  the  abelian  groups  of  order  8,  apart 
from  isomorphisms. 
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40.  (a)  Find  a  composition  series  for  the  8-element  dihedral  group  D4. 

(b)  Find  a  composition  series  for  the  8-element  quaternion  group  i/g. 

41.  (a)  Prove  that  every  subgroup  of  the  quaternion  group  //g  is  normal. 

(b)  Identify  the  conjugacy  classes  in  //g. 

(c)  Compute  the  order  of  Aut  //g . 

42.  Suppose  that  G  is  a  nonabelian  group  of  order  8.  Prove  that  G  has  an  element  of 
order  4  but  no  element  of  order  8. 

43.  Let  G  be  a  nonabelian  group  of  order  8,  and  let  K  be  the  copy  of  C4  generated 
by  some  element  of  order  4.  If  G  has  some  element  of  order  2  that  is  not  in  K , 
prove  that  G  =  £>4. 

44.  Let  G  be  a  nonabelian  group  of  order  8,  and  let  K  be  the  copy  of  C4  generated 
by  some  element  of  order  4.  If  G  has  no  element  of  order  2  that  is  not  in  K , 
prove  that  G  =  //g. 

Problems  45^-8  classify  groups  of  order  12,  making  use  of  Proposition  4.61,  Prob¬ 
lem  15,  and  Problems  35-38.  Let  G  be  a  group  of  order  12,  let  H  be  a  Sylow 
3-subgroup,  and  let  A"  be  a  Sylow  2-subgroup.  Proposition  4.61  says  that  at  least  one 
of  H  and  K  is  normal.  Consequently  there  are  three  cases,  and  these  are  addressed 
by  the  first  three  of  the  problems. 

45.  Verify  that  there  are  only  two  possibilities  for  G  up  to  isomorphism  if  G  is  abelian. 

46.  Suppose  that  K  is  normal,  so  that  G  =  H  xT  K .  Prove  that  either 

(i)  r  is  trivial  or 

(ii)  r  is  nontrivial  and  K  =  C2  x  C2, 
and  deduce  that  G  is  abelian  if  (i)  holds  and  that  G  2;  21 4  if  (ii)  holds. 

47.  Suppose  that  H  is  normal,  so  that  G  =  K  xT  H .  Prove  that  one  of  the  conditions 

(i)  r  is  trivial, 

(ii)  K  =  C2  x  C2  and  r  is  nontrivial, 

(iii)  K  =  C4  and  r  is  nontrivial 

holds,  and  deduce  that  G  is  abelian  if  (i)  holds,  that  G  =  Og  if  (ii)  holds,  and 
that  G  is  nonabelian  and  is  not  isomorphic  to  214  or  D e  if  (iii)  holds. 

48.  In  the  setting  of  the  previous  problem,  prove  that  there  is  one  and  only  one  group, 
up  to  isomorphism,  satisfying  condition  (iii),  and  find  the  order  of  each  of  its 
elements. 

Problems  49-52  assume  that  p  and  q  are  primes  with  p  <  q.  The  problems  go  in  the 
direction  of  classifying  finite  groups  of  order  p2q. 

49.  If  G  is  a  group  of  order  p2q,  prove  that  either  p2q  —  12  or  a  Sylow  <7 -subgroup 
is  normal. 

50.  If  p2  divides  q  —  1 ,  exhibit  three  nonabelian  groups  of  order  p2q  that  are  mutually 
nonisomorphic. 
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51.  If  p  divides  q  —  1  but  p2  does  not  divide  q  —  1,  exhibit  two  nonabelian  groups 
of  order  p2q  that  are  not  isomorphic. 

52.  If  p  does  not  divide  q  —  1 ,  prove  that  any  group  of  order  p2q  is  abelian. 
Problems  53-54  concern  nonabelian  groups  of  order  27. 

53.  (a)  Show  that  multiplication  by  the  elements  1,  4,  7  mod  9  defines  a  nontrivial 

action  of  Z/3Z  on  Z/9Z  by  automorphisms. 

(b)  Show  from  (a)  that  there  exists  a  nonabelian  group  of  order  27. 

(c)  Show  that  the  group  in  (b)  is  generated  by  elements  a  and  b  that  satisfy 

a9  =  b2,  —  b~laba~4  —  1. 

54.  Show  that  any  nonabelian  group  of  order  27  having  a  subgroup  H  isomorphic  to 
C 9  and  an  element  of  order  3  not  lying  in  H  is  isomorphic  to  the  group  constructed 
in  the  previous  problem. 

Problems  55-62  give  a  construction  of  infinitely  many  simple  groups,  some  of  them 
finite  and  some  infinite.  Let  F  be  a  field.  For  n  >  2,  let  SL(n,  F)  be  the  special  linear 
group  for  the  space  F"  of  n -dimensional  column  vectors.  The  center  Z  of  SL(»,  F) 
consists  of  the  scalar  multiples  of  the  identity,  the  scalar  being  an  nth  root  of  1.  Let 
PSL(«,  F)  =  SL(n,  F)/Z.  It  is  known  that  PSL(n,  F)  is  simple  except  for  PSL(2,  F2) 
and  PSL(2,  F3).  These  problems  will  show  that  PSL(2,  F)  is  simple  if  |F|  >  5  and 
F  is  not  of  characteristic  2.  Most  of  the  argument  will  consider  SL(2,  F),  and  the 
passage  to  PSL  will  occur  only  at  the  very  end.  In  Problems  56-61,  G  denotes  a 
normal  subgroup  of  SL(2,  F)  that  is  not  contained  in  the  center  Z,  and  it  is  to  be 
proved  that  G  =  SL(2,  F). 

55.  Suppose  that  F  is  a  finite  field  with  q  elements. 

(a)  By  considering  the  possibilities  for  the  first  column  of  a  matrix  and  then 
considering  the  possibilities  for  the  second  column  when  the  first  column  is 
fixed,  compute  |GL(2,  F)|  as  a  function  of  q. 

(b)  By  using  the  determinant  homomorphism,  compute  |SL(2,  F)|  in  terms  of 
|GL(2,  F)|. 

(c)  Taking  into  account  that  F  does  not  have  characteristic  2,  prove  that 
|PSL(2,  F)|  =  ±|SL(2,F)|. 

(d)  Show  for  a  suitable  finite  field  F  with  more  than  5  elements  that  PSL(2,  F) 
has  order  168. 

56.  Let  M  be  a  member  of  G  that  is  not  in  Z.  Since  M  is  not  scalar,  there  exists  a 
column  vector  u  with  Mu  not  a  multiple  of  u.  Define  v  =  Mu,  so  that  (u,  v)  is 
an  ordered  basis  of  F2.  By  rewriting  all  matrices  with  the  ordered  basis  ( u ,  v ), 
show  that  there  is  no  loss  in  generality  in  assuming  that  G  contains  a  matrix 
A  —  ^  j  M  if  it  is  ultimately  shown  that  G  =  SL(2,  F). 
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57.  Let  a  be  a  member  of  the  multiplicative  group  Fx  to  be  chosen  shortly,  and  let 
B  be  the  member  ^  ca  aQ  ^  of  SL(2,  F).  Prove  that 

(a)  B~x  A-1  B  A  is  upper  triangular  and  is  in  G, 

(b)  B~l  A~l  B  A  has  unequal  diagonal  entries  if  a 4  /  1, 

(c)  the  condition  in  (b)  can  be  satisfied  for  a  suitable  choice  of  a  under  the 
assumption  that  |F|  >  5. 

58.  Suppose  that  C  =  (*  )  is  a  member  of  G  for  some  x  ^  ±1  and  some  y. 

Taking  D  =  ^  j  j  and  forming  C DC~X  D~l ,  show  that  G  contains  a  matrix 
E=  (j*)  withk^O. 

59.  By  conjugating  £  by  ^  ,  j ,  show  that  the  set  of  k  in  F  such  that  ^  ('(  2  j  is  in 

G  is  closed  under  multiplication  by  squares  and  under  addition  and  subtraction. 


60.  Using  the  identity  x  =  ^(x  +  l)2 


that  G  contains  all  matrices 


—  |  (x  —  l)2,  deduce  from  Problems  56-59 
with  leF. 


61.  Show  that  ^  ^  is  conjugate  to  ^  ^  j ,  and  show  that  the  set  of  < 

^  ^  ^  j  and  ^  \  'I  j  generates  SL(2,  F).  Conclude  that  G  =  SL(2,  F). 


the  set  of  all  matrices 


62.  Using  the  First  Isomorphism  Theorem,  conclude  that  the  only  normal  subgroup 
of  PSL(2,  F)  other  than  {1}  is  PSL(2,  F)  itself. 


Problems  63-73  briefly  introduce  the  theory  of  error-correcting  codes.  Let  F  be  the 
finite  field  Z/2Z.  The  vector  space  F”  over  F  will  be  called  Hamming  space,  and 
its  members  are  regarded  as  “words”  (potential  messages  consisting  of  0’s  and  l’s). 
The  weight  wt(c)  of  a  word  c  is  the  number  of  nonzero  entries  in  c.  The  Hamming 
distance  d(a,  b )  between  words  a  =  (a\,  ,  a„)  and  b  —  (b\, . . . ,  bn)  is  the  weight 

of  a  —  b,  i.e.,  the  number  of  indices  i  with  1  <  i  <  n  and  a,  ^  bj.  A  code  is  a 
nonempty  subset  C  of  F",  and  the  minimal  distance  S(C)  of  a  code  is  the  smallest 
value  of  d(a,  b)  for  a  and  b  in  C  with  a  ^  b.  By  convention  if  |C|  =  1,  take 
S(C )  —  n  +  I .  One  imagines  that  members  of  C,  which  are  called  code  words,  are 
allowable  messages,  i.e.,  words  that  can  be  stored  and  retrieved,  or  transmitted  and 
received.  A  code  with  minimal  distance  S  can  then  detect  up  to  S  —  1  errors  in  a 
word  ostensibly  from  C  that  has  been  retrieved  from  storage  or  has  been  received 
in  a  transmission.  The  code  can  correct  up  to  (6  —  l)/2  errors  because  no  word  of 
F"  can  be  at  distance  <  (S  —  l)/2  from  more  than  one  word  in  C,  by  Problem  63 
below.  The  interest  is  in  linear  codes,  those  for  which  C  is  a  vector  subspace.  It 
is  desirable  that  each  message  have  a  high  percentage  of  content  and  a  relatively 
low  percentage  of  further  information  used  for  error  correction;  thus  a  fundamental 
theoretical  problem  for  linear  codes  is  to  find  the  maximum  dimension  of  a  linear 
code  if  n  and  a  lower  bound  on  the  minimal  distance  for  the  code  are  given.  As  a 
practical  matter,  information  is  likely  to  be  processed  in  packets  of  a  standard  length. 
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such  as  some  power  of  2.  In  many  situations  packets  can  be  reprocessed  if  they  have 
been  found  to  have  errors.  The  initial  interest  is  therefore  in  codes  that  can  recognize 
and  possibly  correct  a  small  number  of  errors.  The  problems  in  this  set  are  continued 
at  the  ends  of  Chapters  VII  and  IX. 

63.  Prove  that  the  Hamming  distance  satisfies  d(a,b)  <  d(a,c)  +  d(c,b),  and 
conclude  that  if  a  word  w  in  F"  is  at  distance  <  (D  —  l)/2  from  two  distinct 
members  of  the  linear  code  C,  then  8(C)  <  D. 

64.  Explain  why  the  minimal  distance  8(C)  of  a  linear  code  C  ^  {0}  is  given  by  the 
minimal  weight  of  the  nonzero  words  in  C. 

65.  Fix  n  >  2.  List  8(C)  and  dim  C  for  the  following  elementary  linear  codes: 

(a)  C  =  0. 

(b)  C  =F”. 

(c)  (Repetition  code)  C  =  {0,  (1,1,...,  1)}. 

(d)  (Parity-check  code)  C  =  {c  e  F”  |  wt(c)  is  even}.  (Educational  note:  To 
use  this  code,  one  sends  the  message  in  the  first  n  —  1  bits  and  adjusts  the 
last  bit  so  that  the  word  is  in  C.  If  there  is  at  most  one  error  in  the  word,  this 
parity  bit  will  tell  when  there  is  an  error,  but  it  will  not  tell  where  the  error 
occurs.) 

66.  One  way  to  get  a  sense  of  what  members  of  a  linear  code  C  in  F"  have  small 
weight  starts  by  making  a  basis  for  the  code  into  the  row  vectors  of  a  matrix  and 
row  reducing  the  matrix. 

(a)  Taking  into  account  the  distinction  between  corner  variables  and  independent 
variables  in  the  process  of  row  reduction,  show  that  every  basis  vector  of  C 
has  weight  at  most  the  sum  of  1  and  the  number  of  independent  variables. 
Conclude  that  dimC  +  8(C)  <  n  +  1. 

(b)  Give  an  example  of  a  linear  code  with  8(C)  =  2  for  which  equality  holds. 

(c)  Examining  the  argument  for  (a)  more  closely,  show  that  2  <  dim  C  <  n  —2 
implies  dimC  +  8(C)  <  n. 

/  l  o  o l  l o\ 

67.  Let  C  be  a  linear  code  with  a  basis  consisting  of  the  rows  of  I  o  l  o  l  o  l  J .  Show 

Vo  o  t  o  t  t / 

that  8(C)  —  3.  Educational  note:  Thus  for  n  —  6  and  8(C)  =  3 ,  we  always  have 
dimC  <  3,  and  equality  is  possible. 

68.  (Hamming  codes)  The  Hamming  code  Cj  of  order  7  is  a  certain  linear  code 
having  dimC?  =  4  that  will  be  seen  to  have  8(Cj)  =  3.  The  code  words  of  a 
basis,  with  their  commas  removed,  may  be  taken  as 

1110000,  1001100,  0101010,  1101001. 

The  basis  may  be  described  as  follows.  Bits  1,  2,  4  are  used  as  checks.  The 
remaining  bits  are  used  to  form  the  standard  basis  of  F4.  What  is  put  in  bits 
1,2,4  is  the  binary  representation  of  the  position  of  the  nonzero  entry  in 
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positions  3,  5,  6,  7.  When  all  16  members  of  Ci  are  listed  in  the  order  dictated 
by  the  bits  in  positions  3,  5,  6,  7,  the  resulting  list  is 


Decimal  value 
in  3,  5,  6,  7 

Code  word 

Decimal  value 
in  3,  5,  6,  7 

Code  word 

0 

0000000 

8 

1 1 10000 

1 

1101001 

9 

0011001 

2 

0101010 

10 

1011010 

3 

1000011 

11 

0110011 

4 

1001100 

12 

0111100 

5 

0100101 

13 

1010101 

6 

1100110 

14 

0010110 

7 

0001111 

15 

1111111 

For  the  general  members  of  C-j,  not  just  the  basis  vectors,  the  check  bits  in 
positions  1,  2,  4  may  be  described  as  follows:  the  bit  in  position  1  is  a  parity 
bit  for  the  positions  among  3,  5,  6,  7  having  a  1  in  their  binary  expansions,  the 
bit  in  position  2  is  a  parity  bit  for  the  positions  among  3,  5,  6,  7  having  a  2  in 
their  binary  expansions,  and  the  bit  in  position  4  is  a  parity  bit  for  the  positions 
among  3,  5,  6,  7  having  a  4  in  their  binary  expansions.  The  Hamming  code  Cg 
of  order  8  is  obtained  from  C7  by  adjoining  a  parity  bit  in  position  8. 

(a)  Prove  that  8(Cj)  —  3.  (Educational  note:  Thus  for  n  —  1  and  8(C)  —  3,  we 
always  have  dimC  <  4,  and  equality  is  possible.) 

(b)  Prove  that  <5(Cs)  =  4. 

(c)  Describe  how  to  form  a  generalization  that  replaces  n  —  8  by  n  —  2r  with 
r  >  3.  The  Hamming  codes  that  are  obtained  will  be  called  C 2r-i  and  CV  • 

(d)  Prove  that  dimC2r-i  =  dimC2'-  =  2r  —  r—  1,  S (C’2'-  —  1 )  =  3,  and  8(C2r)  = 

4. 

/ 1  0  1  0  1  0  1  \ 

69.  The  matrix  H  —  (0110011),  when  multiplied  by  any  column  vector  c  in 

Vo  0  0 1  1  1  1  / 

the  Hamming  code  C7,  performs  the  three  parity  checks  done  by  bits  1 , 2,  4  and 
described  in  the  previous  problem.  Therefore  such  a  c  must  have  He  —  0. 

(a)  Prove  that  the  condition  works  in  the  reverse  direction  as  well— that  He  =  0 
only  if  c  is  in  C 7. 

(b)  Deduce  that  if  a  received  word  r  is  not  in  C7  and  if  r  is  assumed  to  match 
some  word  of  C7  except  in  the  i lh  position,  then  Hr  matches  the  ith  column 
of  H  and  this  fact  determines  the  integer  i .  (Educational  note:  Thus  there  is 
a  simple  procedure  for  testing  whether  a  received  word  is  a  code  word  and 
for  deciding,  in  the  case  that  it  is  not  a  code  word,  what  unique  bit  to  change 
to  convert  it  into  a  code  word.) 

70.  Let  r  >  4.  Prove  for  2r_1  <  n  <  2r  —  1  that  any  linear  code  C  in  F"  with 
8(C)  >  3  has  dimC  <  n  —  r.  Observe  that  equality  holds  for  C  —  Cy - 1 . 
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71.  The  weight  enumerator  polynomial  of  a  linear  code  C  is  the  polynomial 
WC(X,Y)  in  Z[X,Y]  given  by  WC(X,Y)  =  £Lo  Nk(C)Xn~kYk,  where 
Nk(C)  is  the  number  of  words  of  weight  k  in  C. 

(a)  Compute  Wc  ( X ,  Y )  for  the  following  linear  codes  C :  the  0  code,  the  code 
F" ,  the  repetition  code,  the  parity  code,  the  code  in  Problem  67,  the  Hamming 
code  Cj,  and  the  Hamming  code  Cs. 

(b)  Why  is  the  coefficient  of  Xn  in  Wc(X,  Y )  necessarily  equal  to  1? 

(c)  Show  that  WC(X,  Y)  =  £csC  x"-wt(f)Twt(c). 

72.  (Cyclic  redundancy  codes)  Cyclic  redundancy  codes  treat  blocks  of  data  as 
coefficients  of  polynomials  in  F[X].  With  the  size  n  of  data  blocks  fixed,  one 
fixes  a  monic  generating  polynomial  G(X)  —  1  +  a  i  X  +  ■  ■  ■  +  i  Xg~ 1  +  Xs 
with  a  nonzero  constant  term  and  with  degree  g  suitably  less  than  n.  Data  to 
be  transmitted  are  provided  as  members  (bo,  b\, . . . ,  b„-g- 1)  of  F"-g  and  are 
converted  into  polynomials  B(X)  —  bo  +  b\ X  +  ■  ■  ■  +  bn- g-\Xn~g~x .  Then 
the  /(-tuple  of  coefficients  of  G(X)B(X )  is  transmitted.  To  decode  a  polynomial 
P(X)  that  is  received,  one  writes  P(X )  =  G (X ) Q(X )  +  R(X)  via  the  division 
algorithm.  If  R(X)  =  0,  it  is  assumed  that  P(X)  is  a  code  word.  Otherwise 
R(X)  is  definitely  not  a  code  word.  Thus  the  code  C  amounts  to  the  system 
of  coefficients  of  all  polynomials  G(X)B(X)  with  B(X )  =  0  or  deg.B(X)  < 
n  —  g  —  1.  A  basis  of  C  is  obtained  by  letting  B(X)  run  through  the  monomials 
1,  X, ... ,  X"-^-1,  and  therefore  dim  C  =  n  —  g.  Take  G(X)  =  1  +  X  +  X2  +  X4 
and  n  >  8.  Prove  that  8(C)  =  2. 

73.  (CRC-8)  The  cyclic  redundancy  code  C  bearing  the  name  CRC-8  has  G(X)  = 
1  +  X  +  X2  +  X8.  Prove  that  if  8  <  n  <  19,  then  8(C)  =  4.  (Educational 
note:  It  will  follow  from  the  theory  of  finite  fields  in  Chapter  IX,  together  with 
the  problems  on  coding  theory  at  the  end  of  that  chapter,  that  n  —  255  plays  a 
special  role  for  this  code,  and  8(C)  —  4  in  that  case.) 

Problems  74-77  concern  categories  and  functors.  Problem  75  assumes  knowledge  of 
point- set  topology. 

74.  Let  C  be  the  category  of  all  sets,  the  morphisms  being  the  functions  between  sets. 
Verify  that  the  disjoint  union  of  sets  is  a  coproduct. 

75.  Let  C  be  the  category  of  all  topological  spaces,  the  morphisms  being  the  contin¬ 
uous  functions.  Let  S  be  a  nonempty  set,  and  let  Xv  be  a  topological  space  for 
each  5  in  S. 

(a)  Show  that  the  Cartesian  product  of  the  spaces  Xs ,  with  the  product  topology, 
is  a  product  of  the  Xs ’s. 

(b)  Show  that  the  disjoint  union  of  the  spaces  Xv,  topologized  so  that  a  set  E  is 
open  if  and  only  if  its  intersection  with  each  Xs  is  open,  is  a  coproduct  of 
the  Xs ’s. 
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76.  Taking  a  cue  from  the  example  of  a  category  in  which  products  need  not  exist, 
exhibit  a  category  in  which  coproducts  need  not  exist. 

77.  Let  C  be  a  category  having  just  one  object,  say  X,  and  suppose  that  every  member 
of  Morph  (X,  X)  is  an  isomorphism.  Prove  that  Morph  (A',  X )  is  a  group  under 
the  law  of  composition  for  the  category.  Can  every  group  be  realized  in  this  way, 
up  to  isomorphism? 

Problems  78-80  introduce  a  notion  of  duality  in  category  theory  and  use  it  to  derive 
Proposition  4.64  from  Proposition  4.63.  If  C  is  a  category,  then  the  opposite  category 
Copp  is  defined  to  have  Obj(Copp)  =  Obj(C)  and  Morphy  opp(A,  B)  —  Morphc(5,  A). 
If  o  denotes  the  law  of  composition  in  C,  then  the  law  of  composition  oopp  in  C  opp  is 
defined  by  g  oopp  /  =  /  o  g  for  /  e  MorphCoPP(A,  B)  and  g  e  MorphCoPP(5,  C). 

78.  Verify  that  Copp  is  indeed  a  category,  that  (Copp)opp  =  C,  and  that  to  pass  from 
a  diagram  involving  objects  and  morphisms  in  C  to  a  corresponding  diagram 
involving  the  same  objects  and  morphisms  considered  as  in  Copp,  one  leaves  all 
the  vertices  and  labels  alone  and  reverses  the  directions  of  all  the  arrows.  Verify 
also  that  the  diagram  of  C  commutes  if  and  only  if  the  diagram  in  C  opp  commutes. 

79.  Let  C  be  the  category  of  all  sets,  the  morphisms  in  Morphc(A,  B)  being  all 
functions  from  A  to  B.  Show  that  the  morphisms  in  Morphcopp(A,  B)  cannot 
necessarily  all  be  regarded  as  functions  from  A  to  B. 

80.  Suppose  that  S’  is  a  nonempty  set  and  that  { Xs  }sc.v  is  an  object  in  C. 

(a)  Prove  that  if  ( X ,  {/?s}j€s)  is  a  product  of  {V^}.SS5  in  C,  then  ( X ,  {/?.s}.sss)  is 
a  coproduct  of  {-XjjseS  in  Copp,  and  that  if  (X,  {ps}ses)  is  a  coproduct  of 
{Vs}.sss  in  C,  then  (. X ,  {jo.s}s€s)  is  a  product  of  {Vs}ieS  in  Copp. 

(b)  Show  that  Proposition  4.64  for  C  follows  from  the  validity  of  Proposition 
4.63  for  Copp. 


CHAPTER  V 


Theory  of  a  Single  Linear  Transformation 


Abstract.  This  goal  of  this  chapter  is  to  find  finitely  many  canonical  representatives  of  each 
similarity  class  of  square  matrices  with  entries  in  a  field  and  correspondingly  of  each  isomorphism 
class  of  linear  maps  from  a  finite-dimensional  vector  space  to  itself. 

Section  1  frames  the  problem  in  more  detail.  Section  2  develops  the  theory  of  determinants  over 
a  commutative  ring  with  identity  in  order  to  be  able  to  work  easily  with  characteristic  polynomials 
det(X7  —  A).  The  discussion  is  built  around  the  principle  of  “permanence  of  identities,”  which 
allows  for  passage  from  certain  identities  with  integer  coefficients  to  identities  with  coefficients  in 
the  ring  in  question. 

Section  3  introduces  the  minimal  polynomial  of  a  square  matrix  or  linear  map.  The  Cayley- 
Hamilton  Theorem  establishes  that  such  a  matrix  satisfies  its  characteristic  equation,  and  it  follows 
that  the  minimal  polynomial  divides  the  characteristic  polynomial.  It  is  proved  that  a  matrix  is 
similar  to  a  diagonal  matrix  if  and  only  if  its  minimal  polynomial  is  the  product  of  distinct  factors 
of  degree  1 .  In  combination  with  the  fact  that  two  diagonal  matrices  are  similar  if  and  only  if  their 
diagonal  entries  are  permutations  of  one  another,  this  result  solves  the  canonical-form  problem  for 
matrices  whose  minimal  polynomial  is  the  product  of  distinct  factors  of  degree  1 . 

Section  4  introduces  general  projection  operators  from  a  vector  space  to  itself  and  relates  them  to 
vector-space  direct-sum  decompositions  with  finitely  many  summands.  The  summands  of  a  direct- 
sum  decomposition  are  invariant  under  a  linear  map  if  and  only  if  the  linear  map  commutes  with 
each  of  the  projections  associated  to  the  direct-sum  decomposition. 

Section  5  concerns  the  Primary  Decomposition  Theorem,  whose  subject  is  the  operation  of 
a  linear  map  L  :  V  — »  V  with  V  finite-dimensional.  The  statement  is  that  if  L  has  minimal 
polynomial  Pi  (X);i  •  •  •  Pii(X)lk  with  the  Pj(X)  distinct  monic  prime,  then  V  has  a  unique  direct- 
sum  decomposition  in  which  the  respective  summands  are  the  kernels  of  the  linear  maps  Pj  ( L)  J , 
and  moreover  the  minimal  polynomial  of  the  restriction  of  L  to  the  Jth  summand  is  Pj  ( Xyi . 

Sections  6-7  concern  Jordan  canonical  form.  For  the  case  that  the  prime  factors  of  the  minimal 
polynomial  of  a  square  matrix  all  have  degree  1 ,  the  main  theorem  gives  a  canonical  form  under 
similarity,  saying  that  a  given  matrix  is  similar  to  one  in  "Jordan  form"  and  that  the  Jordan  form 
is  completely  determined  up  to  permutation  of  the  constituent  blocks.  The  theorem  applies  to  all 
square  matrices  if  the  field  is  algebraically  closed,  as  is  the  case  for  €.  The  theorem  is  stated  and 
proved  in  Section  6,  and  Section  7  shows  how  to  make  computations  in  two  different  ways. 


1.  Introduction 

This  chapter  will  work  with  vector  spaces  over  a  common  field  of  “scalars,”  which 
will  be  called  K.  As  was  observed  near  the  end  of  Section  IV.5,  all  the  results 
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concerning  vector  spaces  in  Chapter  II  remain  valid  when  the  scalars  are  taken 
from  K  rather  than  just  Q  or  R  or  C.  The  ring  of  polynomials  in  one  indeterminate 
X  over  K  will  be  denoted  by  K[X], 

For  the  field  C  of  complex  numbers,  every  nonconstant  polynomial  in  C[X] 
has  a  root,  according  to  the  Fundamental  Theorem  of  Algebra  (Theorem  1.18). 
Because  of  this  fact  some  results  in  this  chapter  will  take  an  especially  simple 
form  when  K  =  C,  and  this  simple  form  will  persist  for  any  field  with  this 
same  property.  Accordingly,  we  make  a  definition.  Let  us  say  that  a  field  K  is 
algebraically  closed  if  every  nonconstant  polynomial  in  K[X]  has  a  root.  We 
shall  work  hard  in  Chapter  IX  to  obtain  examples  of  algebraically  closed  fields 
beyond  K  =  C,  but  let  us  mention  now  what  a  few  of  them  are. 

Examples. 

(1)  The  subset  of  C  of  all  roots  of  polynomials  with  rational  coefficients  is  an 
algebraically  closed  field. 

(2)  For  each  prime  p,  we  have  seen  that  any  finite  field  of  characteristic  p  has 
pn  elements  for  some  n.  It  turns  out  that  there  is  one  and  only  one  field  of  p" 
elements,  up  to  isomorphism,  for  each  n.  If  we  align  them  suitably  for  fixed  p 
and  take  their  union  on  n,  then  the  result  is  an  algebraically  closed  field. 

(3)  If  K  is  any  field,  then  there  exists  an  algebraically  closed  field  having  K  as 
a  subfield.  We  shall  prove  this  existence  in  Chapter  IX  by  means  of  Zermelo’s 
Well-Ordering  Theorem  (which  appears  in  Section  A5  of  the  appendix). 

The  general  problem  to  be  addressed  in  this  chapter  is  to  find  “canonical  forms” 
for  linear  maps  from  finite-dimensional  vector  spaces  to  themselves,  special  ways 
of  realizing  the  linear  maps  that  bring  out  some  of  their  properties.  Let  us  phrase 
a  specific  problem  of  this  kind  completely  in  terms  of  linear  algebra  at  first.  Then 
we  can  rephrase  it  in  terms  of  a  combination  of  linear  algebra  and  group  theory, 
and  we  shall  see  how  it  fits  into  a  more  general  context. 

In  terms  of  matrices,  the  specific  problem  is  to  find  a  way  of  deciding  whether 
two  square  matrices  represent  the  same  linear  map  in  different  bases.  We  know 
from  Proposition  2.17  that  if  L  :  V  — >■  V  is  linear  on  the  finite-dimensional 
vector  space  V  and  if  A  is  the  matrix  of  L  relative  to  a  particular  ordered  basis  in 
domain  and  range,  then  the  matrix  B  of  L  in  another  ordered  basis  is  of  the  form 
B  =  C~l  AC  for  some  invertible  matrix  C,  i.e.,  A  and  B  are  similar.1  Thus  one 
kind  of  solution  to  the  problem  would  be  to  specify  one  representative  of  each 
similarity  class  of  square  matrices.  But  this  is  not  a  convenient  kind  of  answer 
to  look  for;  in  fact,  the  matrices  A  =  and  B  =  Q  M  are  similar  via 

*A  square  matrix  A  with  a  two-sided  inverse  is  sometimes  said  to  be  nonsingular.  A  square 
matrix  with  no  inverse  is  then  said  to  be  singular. 
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C  =  (  ^  o  )  •  but  there  is  no  particular  reason  to  prefer  one  of  A  or  B  to  the  other. 
Thus  a  “canonical  form”  for  detecting  similarity  will  allow  more  than  one  repre¬ 
sentative  of  each  similarity  class  (but  typically  only  finitely  many  such  represen¬ 
tatives),  and  a  supplementary  statement  will  tell  us  when  two  such  are  similar. 

So  far,  the  best  information  that  we  have  about  solving  this  problem  concerning 
square  matrices  comes  from  Section  II. 8.  In  that  section  the  discussion  of  eigen¬ 
values  gave  us  some  necessary  conditions  for  similarity,  but  we  did  not  obtain  a 
useful  necessary  and  sufficient  condition. 

In  terms  of  linear  maps,  what  we  seek  for  a  linear  L  :  V  — >  V  is  to  use  the 
geometry  of  L  to  construct  an  ordered  basis  of  V  such  that  L  acts  in  a  particularly 
simple  way  on  that  ordered  basis.  Ideally  the  description  of  how  L  acts  on  the 
ordered  basis  is  to  be  detailed  enough  so  that  the  matrix  of  L  in  that  ordered  basis 
is  completely  determined  by  the  description,  even  though  the  ordered  basis  may 
not  be  determined  by  it.  For  example,  if  L  were  to  have  a  basis  of  eigenvectors, 
then  the  description  could  be  that  “L  has  an  ordered  basis  of  eigenvectors  with 
eigenvalues  x\,  . . . ,  xn.”  In  any  ordered  basis  with  this  property,  the  matrix  of  L 
would  then  be  diagonal  with  diagonal  entries  x\ , ...  ,xn. 

Suppose  then  that  we  have  this  kind  of  detailed  description  of  how  a  linear 
map  L  acts  on  some  ordered  basis.  To  what  extent  is  L  completely  determined? 
The  answer  is  that  L  is  determined  up  to  an  isomorphism  of  the  underlying  vector 
space.  In  fact,  suppose  that  L  and  M  are  linear  maps  from  V  to  itself  such  that 

)=  A  =  (  ^  )  for  some  ordered  bases  T  and  A.  Then 
\AAj 


where  S  :  V  — >  V  is  the  invertible  linear  map  defined  by 

Hence  L  =  S~l MS  and  SL  =  MS.  In  other  words,  if  we  think  of  having 
two  copies  of  V,  one  called  V\  and  the  other  called  \A,  that  are  isomorphic  via 
S  :  V i  — >  V2,  then  the  effect  of  M  in  V2  corresponds  under  S  to  the  effect  of  L 
in  V\ .  In  this  sense,  L  is  determined  up  to  an  isomorphism  of  V . 

Thus  we  are  looking  for  a  geometric  description  that  determines  linear  maps 
up  to  isomorphism.  Two  linear  maps  L  and  M  that  are  related  in  this  way  have 
L  =  S  ~ 1  MS  for  some  invertible  linear  map  S.  Passing  to  matrices  with  respect  to 
some  basis,  we  see  that  the  matrices  of  L  and  M  are  to  be  similar.  Consequently 
our  two  problems,  one  to  characterize  similarity  for  matrices  and  the  other  to 
characterize  isomorphism  for  linear  maps,  come  to  the  same  thing. 
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These  two  problems  have  an  interpretation  in  terms  of  group  theory.  In  the 
case  of  n-by-n  matrices,  the  group  GL(«,  K)  of  invertible  matrices  acts  on  the  set 
of  all  square  matrices  of  size  n  by  conjugation  via  (g,  x)  gxg~ 1 ;  the  similarity 
classes  are  exactly  the  orbits  of  this  group  action,  and  the  canonical  form  is  to 
single  out  finitely  many  representatives  from  each  orbit.  In  the  case  of  linear 
maps,  the  group  GL(  V)  of  invertible  linear  maps  on  the  finite-dimensional  vector 
space  V  acts  by  conjugation  on  the  set  of  all  linear  maps  from  V  into  itself;  the 
isomorphism  classes  of  linear  maps  on  V  are  the  orbits,  and  the  canonical  form 
is  to  single  out  finitely  many  representatives  from  each  orbit. 

The  above  problem,  whether  for  matrices  or  for  linear  maps,  does  not  have  a 
unique  acceptable  solution.  Nevertheless,  the  text  of  this  chapter  will  ultimately 
concentrate  on  one  such  solution,  known  as  the  “Jordan  canonical  form.” 

Now  that  we  have  brought  group  theory  into  the  statement  of  the  problem,  we 
can  put  matters  in  a  more  general  context:  The  situation  is  that  some  “important” 
group  G  acts  in  an  important  way  on  an  “interesting”  vector  space  of  matrices.  The 
canonical-form  problem  for  this  situation  is  to  single  out  finitely  many  represen¬ 
tatives  of  each  orbit  and  give  a  way  of  deciding,  in  terms  of  these  representatives, 
whether  two  of  the  given  matrices  lie  in  the  same  orbit.  We  shall  not  pursue  the 
more  general  problem  in  the  text  at  this  time.  However,  Problem  1  at  the  end  of 
the  chapter  addresses  one  version  beyond  the  one  concerning  similarity:  to  find 
a  canonical  form  for  the  action  of  GL (in,  K)  x  GLfn,  K)  on  m-by-n  matrices 
by  ( (g ,  If),  x)  =  gxh~l .  Some  other  groups  that  are  important  in  this  sense, 
besides  products  of  general  linear  groups,  are  introduced  in  Chapter  VI,  and  a 
problem  at  the  end  of  Chapter  VI  reinterprets  two  theorems  of  that  chapter  as 
further  canonical-form  theorems  under  the  action  of  a  general  linear  group. 

Let  us  return  to  the  canonical-form  problems  for  similarity  of  matrices  and 
isomorphism  of  linear  maps.  The  basic  tool  in  studying  these  problems  is  the 
characteristic  polynomial  of  a  matrix  or  a  linear  map,  as  in  Chapter  II.  However, 
we  subtly  used  a  special  feature  of  Q  and  R  and  C  in  working  with  characteristic 
polynomials  in  Chapter  II:  we  passed  back  and  forth  between  the  characteristic 
polynomial  det(A7  —  A)  as  a  polynomial  in  one  indeterminate  (defined  by  its 
expression  after  expanding  it  out)  and  as  a  polynomial  function  of  A,  defined  for 
each  value  of  A  in  Q  or  R  or  C,  one  value  at  a  time.  This  passage  was  legitimate 
because  the  homomorphism  of  the  ring  of  polynomials  in  one  indeterminate  over 
a  field  to  the  ring  of  polynomial  functions  is  one-one  when  the  held  is  infinite, 
by  Proposition  4.28c  or  Corollary  1.14.  Some  care  is  required,  however,  in 
working  with  general  fields,  and  we  begin  by  supplying  the  necessary  details  for 
justifying  manipulations  with  determinants  in  a  more  general  setting  than  earlier. 
The  end  result  will  be  that  the  characteristic  polynomial  is  a  polynomial  in  one 
indeterminate,  and  we  shall  henceforth  call  that  indeterminate  X ,  rather  than  A, 
so  as  to  emphasize  this  point  of  view. 


2.  Determinants  over  Commutative  Rings  with  Identity 
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2.  Determinants  over  Commutative  Rings  with  Identity 

Throughout  this  section  let  R  be  a  commutative  ring  with  identity.  The  main  case 
of  interest  for  us  at  this  time  will  be  that  R  =  K[X]  is  the  polynomial  ring  in  one 
indeterminate  X  over  a  held  K. 

The  set  of  n-by-77  matrices  with  entries  in  R  is  an  abelian  group  under  entry- 
by-entry  addition,  and  matrix  multiplication  makes  it  into  a  ring  with  identity. 
Following  tradition,  we  shall  usually  write  Mn(R)  rather  than  Mnn(R )  for  this 
ring.  In  this  section  we  shall  define  a  determinant  function  det  :  Mn(R )  — >  R  and 
establish  some  of  its  properties.  For  the  case  that  R  is  a  held,  some  of  our  earlier 
proofs  concerning  determinants  used  vector-space  concepts— bases,  dimensions, 
and  so  forth — and  these  are  not  available  for  general  R.  Yet  most  of  the  properties 
of  determinants  remain  valid  for  general  R  because  of  a  phenomenon  known  as 
permanence  of  identities.  We  shall  not  try  to  state  a  general  theorem  about 
this  principle  but  instead  will  be  content  to  observe  a  pattern  in  how  the  relevant 
identities  are  proved. 

If  A  is  in  Mn(R),  we  dehne  its  determinant  to  be 

det  A  =  £  (sgno-)Ai<T(i)A2(T(2)  •  •  •  Ancr(n)  ? 

crsS„ 


in  effect  converting  into  a  dehnition  the  formula  obtained  in  Theorem  2.34d  when 
R  is  a  held. 

A  sample  of  the  kind  of  identity  we  have  in  mind  is  the  formula 

det(Afi)  =  det  A  det  B  for  A  and  B  in  Mn(R). 

The  key  is  that  this  formula  says  that  two  polynomials  in  2n2  variables,  with 
integer  coefficients,  are  equal  whenever  arbitrary  members  of  R  are  substituted 
for  the  variables.  Thus  let  us  introduce  2 n2  indeterminates  Xu,  X 12,  . . . ,  Xnn 
and  Xu ,  T12, . . . ,  T„„  to  correspond  to  these  variables.  Forming  the  commutative 
ring  S  =  Z[Xn,  X12, .  ■  ■ ,  X„„,  Tn,  Ti2, . . . ,  Ynn  ],  we  assemble  the  matrices 
X  =  [Xf  /  J,  Y  =  \  Y/j  I,  and  XT  =  XikYkj]  in  Mn(S).  Consider  the  two 
members  of  S  given  by 

det  X  det  T 

=  (  (sgno')Xi<T(i)X2a(2)  *  *  *  Xn(T(nf*} ^  (sgnCT)Ticr(i)T2(T(2)  *  *  •  Ynn(n^ 
cre&n  cre(5n 


and 


det(XT)  =  £  (sgncr)(XT)i(T(i)(XT)2(j(2)  •  •  •  (XT)no.(n), 
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where  (XY)^  =  Yl  X^Ykj.  If  we  fix  arbitrary  elements  xn,  Xu, . . . ,  xn„  and 
Vn,  y  12,  . . . ,  y„„  of  Z,  then  Proposition  4.30  gives  us  a  unique  substitution  ho¬ 
momorphism  4^  :  S  — >■  Z  such  that  4>(1)  =  1,  'Y(Xij)  =  Xjj,  and  'P  ( Ylf )  =  ytj 
for  all  i  and  j.  Writing  x  =  [x,/  |  and  y  =  [y,/]  and  using  that  matrices  with 
integer  entries  have  det(xy)  =  detv  det  y  because  Z  is  a  subset  of  the  field  Q,  we 
see  that  'h  (det(Z  Y))  =  4>  (det  X  det  Y )  for  each  choice  of  x  and  y.  Since  Z  is  an 
infinite  integral  domain  and  since  x  and  y  are  arbitrary,  Corollary  4.32  allows  us 
to  deduce  that 

det(XY)  =  det  X  det  Y 


as  an  equality  in  S. 

Now  we  pass  from  an  identity  in  S  to  an  identity  in  R.  Let  1^  be  the  identity  in 
R.  Proposition  4.19  gives  us  a  unique  homomorphism  of  rings  <pi  :  Z  — >■  R 
such  that  <pi(l)  =  1  r .  If  we  fix  arbitrary  elements  A\\,  A\2, . . . ,  Ann  and 
B\\ ,  B\2, . . . ,  Bnn  of  R ,  then  Proposition  4.30  gives  us  a  unique  substitution 
homomorphism  <I>  :  S  — >  R  such  that  <J> ( 1)  =  <pi(l)  =  1«,  4>(Xi;)  =  A,y 
for  all  i  and  j,  and  <t> ( Yjj )  =  B, ;  for  all  i  and  j.  Applying  to  the  equality 
det(XF)  =  det  X  det  Y,  we  obtain  the  identity  we  sought,  namely 

det(Afi)  =  det  A  det  B  for  A  and  B  in  Mn(R )■ 

Proposition  5.1.  If  R  is  a  commutative  ring  with  identity,  then  the  determinant 
function  det  :  Mn(R )  — »■  R  has  the  following  properties: 

(a)  det(Afi)  =  det Adet B, 

(b)  det  /  =  1, 

(c)  det  A ’  =  det  A, 

(d)  det  C  =  det  A  +  det  B  if  A,  B,  and  C  match  in  all  rows  but  the  /lh  and  if 
the  /lh  row  of  C  is  the  sum  of  the  /th  rows  of  A  and  B, 

(e)  det  B  =  r  det  A  if  A  and  B  match  in  all  rows  but  the  jth  and  if  the  /lh  row 
of  B  is  equal  entry  by  entry  to  r  times  the  jth  row  of  A  for  some  r  in  R, 

(f)  det  A  =  0  if  A  has  two  equal  rows, 

(g)  det  ^  ^  =  det  A  det  D  if  A  is  in  Mk{R),  D  is  in  Mi  (R),  and  k  + 1  =  n. 

Remarks.  Properties  (d),  (e),  and  (f)  imply  that  usual  steps  in  manipulating 
determinants  by  row  reduction  continue  to  be  valid. 

PROOF.  Part  (a)  was  proved  above,  and  parts  (c)  through  (f)  may  be  proved 
in  the  same  way  from  the  corresponding  facts  about  integer  matrices  in  Section 
II. 7.  Part  (b)  is  immediate  from  the  definition. 

For  (g),  we  first  prove  the  result  when  the  entries  are  in  Q,  and  then  we  argue 
in  the  same  way  as  with  (a)  above.  When  the  entries  are  in  Q,  row  reduction 
of  D  allows  us  to  reduce  to  the  case  either  that  D  has  a  row  of  0’s  or  that  D 
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is  the  identity.  If  D  has  a  row  of  0's,  then  det  (  q  d  )  ant^  det  ^  det  D  are  both 
0  and  hence  are  equal.  If  D  is  the  identity,  then  further  row  reduction  shows 
that  det  ^  ^  =  det  (  q  / )  ’  anc*  die  right  side  equals  det  A  =  det  A  det  /,  as 
required.  □ 


Proposition  5.2  (expansion  in  cofactors).  Let  R  be  a  commutative  ring  with 
identity,  let  A  be  in  Mn(R),  and  let  A(/  be  the  member  of  M„_  \(R)  obtained  by 
deleting  the  zth  row  and  the  jth  column  from  A.  Then 

(a)  forany  j ,  det  A  =  Y^=\  (—  1)'+/A(,  det  A(/,i.e.,det  A  may  be  calculated 
by  ‘‘expansion  in  cofactors”  about  the  jth  column, 

(b)  for  any ; ,  det  A  =  Yl"j= l  (— 1 )!+;  A,;-  det  A;y,  i.e.,  det  A  may  be  calculated 
by  “expansion  in  cofactors”  about  the  zth  row. 

PROOF.  This  may  be  derived  in  the  same  way  from  Proposition  2.36  by  using 
the  principle  of  permanence  of  identities.  □ 


Corollary  5.3  (Vandermonde  matrix  and  determinant).  If  r\, . . . ,  r„  lie  in  a 
commutative  ring  R  with  identity,  then 


n  <*-*)■ 

j>> 


PROOF.  The  derivation  of  this  from  Proposition  5.2  is  the  same  as  the  derivation 
of  Corollary  2.37  from  Proposition  2.35.  □ 


Proposition  5.4  (Cramer’s  rule).  Let  R  be  a  commutative  ring  with  identity, 
let  A  be  in  Mn(R),  and  define  /\adj  in  Mn(R)  to  be  the  classical  adjoint  of  A, 
namely  the  matrix  with  entries  A‘‘dj  =  (—  I )'+/  det  A;1-,  where  Afj  defined  as  in 
the  statement  of  Proposition  5.2.  Then  A  /\adj  =  A adj  A  =  (det  A)/. 

PROOF.  This  may  be  derived  from  Proposition  2.38  in  the  same  way  as  for 
Propositions  5.1  and  5.2  using  the  principle  of  permanence  of  identities.  □ 


Corollary  5.5.  Let  R  be  a  commutative  ring  with  identity,  and  let  A  be 
in  Mn{R).  If  det  A  is  a  unit  in  R,  then  A  has  a  two-sided  inverse  in  M„ (R). 
Conversely  if  A  has  a  one-sided  inverse  in  Mn(R),  then  det  A  is  a  unit  in  R. 

Remark.  If  R  is  a  field,  then  A  and  any  associated  linear  map  are  often  called 
nonsingular  if  invertible,  singular  otherwise.  When  R  is  not  a  field,  terminology 
varies  for  what  to  call  a  noninvertible  matrix  whose  determinant  is  not  0. 
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PROOF.  If  clct  A  is  a  unit  in  R ,  let  r  be  its  multiplicative  inverse.  Then 
Proposition  5.4  shows  that  r  AadJ  is  a  two-sided  inverse  of  A.  Conversely  if  A 
has,  say,  a  left  inverse  />’,  then  BA  =  I  implies  (det  Z? ) (det  A)  =  det  7  =  1,  and 
det  B  is  an  inverse  for  det  A.  A  similar  argument  applies  if  A  has  a  right  inverse. 

□ 


3.  Characteristic  and  Minimal  Polynomials 

Again  let  K  be  a  field.  If  A  is  in  Mn( K),  the  characteristic  polynomial  of  A  is 
defined  to  be  the  member  of  the  ring  K[X]  of  polynomials  in  one  indeterminate 
X  given  by  F(X)  =  dct( X /  —  A).  The  material  of  Section  2  shows  that  F (X) 
is  well  defined,  being  the  determinant  of  a  member  of  M„(K[X]).  It  is  apparent 
from  the  definition  of  determinant  in  Section  2  that  F{X)  is  a  monic  polynomial 
of  degree  n  with  coefficient  —  Tr  A  =  —  ^"=1  Ajj  for  X"-1.  Evaluating  F(X) 
at  0,  we  see  that  the  constant  term  is  (—  1)"  det  A. 

Since  the  determinant  of  a  product  in  A7„(K[X|)  is  the  product  of  the  de¬ 
terminants  (Proposition  5.1a)  and  since  C-1(X7  —  A)C  =  XI  —  C-1AC,  we 
have 

det(X7  -  C~lAC)  =  (detC)-1  det(X7  -  A)(detC)  =  det(X7  -  A). 

Thus  similar  matrices  have  equal  characteristic  polynomials.  If  V  is  an  n- 
dimensional  vector  space  over  K  and  L  :  V  — »■  V  is  linear,  then  the  matrices  of 
L  in  any  two  ordered  bases  of  V  (the  domain  basis  being  assumed  equal  to  the 
range  basis)  are  similar,  and  their  characteristic  polynomials  are  the  same.  Conse¬ 
quently  we  can  define  the  characteristic  polynomial  of  L  to  be  the  characteristic 
polynomial  of  any  matrix  of  L. 

The  development  of  characteristic  polynomials  has  thus  be  redone  in  a  way 
that  is  valid  over  any  field  K  without  making  use  of  the  ring  homomorphism  from 
polynomials  in  one  indeterminate  over  K  to  polynomial  functions  from  K  into 
itself.  The  discussion  in  Section  II. 8  of  eigenvectors  and  eigenvalues  for  members 
A  of  M„  (K)  and  for  linear  maps  L  :  V  — »■  V  with  V  finite-dimensional  over  K 
is  now  meaningful,  and  there  is  no  need  to  repeat  it. 

In  particular,  the  eigenvalues  of  A  and  L  are  exactly  the  roots  of  their  charac¬ 
teristic  polynomial,  no  matter  what  K  is.  If  K  is  algebraically  closed,  then  the 
characteristic  polynomial  has  a  root,  and  consequently  A  and  L  each  have  at  least 
one  eigenvalue. 

If  L  :  V  — >  V  is  linear  and  V  is  finite-dimensional,  then  a  vector  subspace 
U  of  V  is  said  to  be  invariant  under  L  if  L(U)  C  U .  In  this  case  L  |  {J  is  a 
well-defined  linear  map  from  U  to  itself.  Since  L(U)  C  U,  Proposition  2.25 
shows  that  L  :  V  — >  V  factors  through  V/U  as  a  linear  map  L  :  V/U  — >  V/U. 
We  shall  use  this  construction,  the  existence  of  eigenvalues  in  the  algebraically 
closed  case,  and  an  induction  to  prove  the  following. 
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Proposition  5.6.  If  K  is  an  algebraically  closed  field,  if  V  is  a  finite¬ 
dimensional  vector  space  over  K,  and  if  L  :  V  —>■  V  is  linear,  then  V  has 
an  ordered  basis  in  which  the  matrix  of  L  is  upper  triangular.  Consequently  any 
member  of  M„  (K)  is  similar  to  an  upper  triangular  matrix. 

jc,  »\ 

Remarks.  For  an  upper  triangular  matrix  A  =  I  ■.  I  in  Mn(K),  the 

\  0  cn  / 

characteristic  polynomial  is  ]"[/= i  (X  ~  cj )  because  the  only  nonzero  term  in  the 
definition  of  det(X  I  —  A)  is  the  one  corresponding  to  the  identity  permutation. 
Triangular  form  is  not  yet  the  canonical  form  we  seek  for  a  square  matrix  because 
a  particular  square  matrix  may  be  similar  to  infinitely  many  matrices  in  triangular 
form. 

PROOF.  We  proceed  by  induction  on  n  =  dim  V,  with  the  base  case  n  =  1 
being  clear.  Suppose  that  the  result  holds  for  all  linear  maps  from  spaces  of 
dimension  <  n  to  themselves.  Given  L  :  V  —>■  V  with  dim  V  =  n,  let  V\  be 
an  eigenvector  of  L.  This  exists  by  the  remarks  before  the  proposition  since  IK 
is  algebraically  closed.  Let  U  be  the  vector  subspace  Kiq.  Then  L(U )  C  U, 
and  Proposition  2.25  shows  that  L  :  V  —>■  V  factors  through  V/U  as  a  linear 
map  L  :  V/U  — >  V/U.  Since  dim  V / U  =  n  —  1,  the  inductive  hypothesis 
produces  an  ordered  basis  (V2, .  ■  ■ ,  vn)  of  V/U  such  that  the  matrix  of  L  is  upper 
triangular  in  this  basis.  This  condition  means  that  L(vj)  =  ^W=2c,;-V;  for  j  >  2. 
Select  coset  representatives  tn,  . . . ,  vn  of  ih,  •  ■  ■ ,  vn  so  that  Vj  =  Vj  +  U  for 
j  >  2.  Then  L(Vj  +  U)  =  J2i=2cij(vi  +  f°r  ./  >  2,  and  hence  L(vj ) 
lies  in  the  coset  ^/=2  Q;iV  +  U  for  j  >  2.  For  each  j  >  1,  we  then  have 

L(vj)  =  J2'!=2  CUV‘  +  cijvi  f°r  some  scalar  c\j,  and  we  see  that  (ni,  . . . ,  vn)  is 
the  required  ordered  basis.  □ 

Let  us  return  to  the  situation  in  which  IK  is  any  field.  For  a  matrix  A  in  Mn  (K) 
and  a  polynomial  P  in  K[X],  it  is  meaningful  to  form  P(A).  We  can  do  so  by 
two  equivalent  methods,  both  useful.  The  concrete  way  of  forming  P{A)  is  as 
P(A)  =  cnAn  +  •  •  •  +  ci A  +  col  if  P(X)  =  cnXn  +  •  •  •  +  cxX  +  c0.  The 
abstract  way  is  to  form  the  subring  T  of  M„(K)  generated  by  K7  and  A.  This 
subring  is  commutative.  We  let  <p  :  K  — >■  T  be  given  by  (pic)  =  cl.  Then  the 
universal  mapping  property  of  K[X]  given  in  Proposition  4.24  produces  a  unique 
ring  homomorphism  O  :  K[X]  — >■  T  such  that  4>(c)  =  cl  for  all  c  €  K  and 
cp(X)  =  A.  The  value  of  P(A)  is  the  element  <$>(P)  of  T. 

For  A  in  Mn( K),  let  us  study  all  polynomials  P  such  that  P(A)  =  0.  For  any 
polynomial  P  and  any  invertible  matrix  C,  we  have 


P(C~1  AC)  =  C~{  P(A)C 
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because  if  P(X )  =  c„Xn  +  •  •  •  +  c\X  +  co,  then 

P(C~lAC )  =  c„(C_1  AC)'1  +  •  •  •  +  ciC-1  AC  +  c0l 
=  C  1  (c„  A"  +  •  •  •  +  ci  A  +  cqI)C. 

Consequently  if  P(A)  =  0,  then  P(C~lAC )  =  0,  and  the  set  of  matrices  with 
P{A)  =  0  is  closed  under  similarity.  We  shall  make  use  of  this  observation  a 
little  later  in  this  section. 

Proposition  5.7.  If  A  is  in  Mn( K),  then  there  exists  a  nonzero  polynomial  P 
in  K[X]  such  that  P(A)  =  0. 

PROOF.  The  K  vector  space  Mn( K)  has  dimension  n2.  Therefore  the  n2  +  1 
matrices  /,  A,  A  , ... ,  A,r  are  linearly  dependent,  and  we  have 

Co  T  Ci  A  T  C2-A“  T  *  *  *  T  c/;2  A(I  =  0 

for  some  set  of  scalars  not  all  0.  Then  P(A)  =  0  for  the  polynomial  P(X)  = 
co  +  c\X  +  C2X-  +  •  •  •  +  cniXn  ;  this  P  is  not  the  0  polynomial  since  at  least  one 
of  the  coefficients  is  not  0.  □ 

Alternative  proof  if  K  is  algebraically  closed.  Since  the  set  of  poly¬ 
nomials  P  with  P(  A)  =  0  depends  only  on  the  similarity  class  of  A,  Proposition 
5.6  shows  that  there  is  no  loss  of  generality  in  assuming  that  A  is  upper  triangular, 

(A  *  \ 

I .  Then  A  —  kj  1  is  upper  triangular  with  0  in  the  jth 

0  k) 

diagonal  entry,  and  \\]=\  —  /)  is  upper  triangular  with  0  in  all  diagonal 

entries.  Therefore  ( ]~["=  =  |  (A  -  Xjl))n  =0.  □ 

With  A  fixed,  we  continue  to  consider  the  set  of  all  polynomials  P{X)  such 
that  P{A)  =  0.  Let  us  think  of  P( A)  as  being  computed  by  the  abstract  proce¬ 
dure  described  above,  namely  as  the  image  of  A  under  the  ring  homomorphism 
<b  :  K[  X]  T  such  that  0(c)  =  cl  for  all  c  e  K  and  O(X)  =  A,  where  T  is 
the  commutative  subring  of  Mn( K)  generated  by  K7  and  A.  Then  the  set  of  all 
polynomials  P(X)  with  P ( A )  =  0  is  the  kernel  of  the  ring  homomorphism  O. 
This  set  is  therefore  an  ideal,  and  Proposition  5.7  shows  that  the  ideal  is  nonzero. 
We  shall  apply  the  following  proposition  to  this  ideal. 

Proposition  5.8.  If  /  is  a  nonzero  ideal  in  K[X],  then  there  exists  a  unique 
monic  polynomial  of  lowest  degree  in  /,  and  every  member  of  I  is  the  product 
of  this  particular  polynomial  by  some  other  polynomial. 
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PROOF.  Let  B(X)  be  a  nonzero  member  of  /  of  lowest  possible  degree; 
adjusting  B  by  a  scalar  factor,  we  may  assume  that  B  is  monic.  If  A  is  in  7, 
then  Proposition  1.12  produces  polynomials  Q  and  R  such  that  A  =  BQ  +  R 
and  either  R  =  0  or  deg  R  <  deg  B.  Since  I  is  an  ideal,  B  Q  is  in  /  and  hence 
R  =  A  —  BQ  is  in  I .  From  minimality  of  the  degree  of  B,  we  conclude  that 
R  =  0.  Hence  A  =  BQ,  and  A  is  exhibited  as  the  product  of  B  and  some  other 
polynomial  Q.  If  B\  is  a  second  monic  polynomial  of  lowest  degree  in  /  .then  we 
can  take  A  =  B\  to  see  that  B\  =  QB.  Since  deg  B\  =  deg  B,  we  conclude  that 
deg  <2=0.  Thus  Q  is  a  constant  polynomial.  Comparing  the  leading  coefficients 
of  B  and  B\ ,  we  see  that  Q(X)  =  1.  □ 

With  A  fixed  in  Mn  (K),  let  us  apply  Proposition  5.8  to  the  ideal  of  all  polyno¬ 
mials  P  in  K[X]  with  P  {  A)  =  0.  The  unique  monic  polynomial  of  lowest  degree 
in  this  ideal  is  called  the  minimal  polynomial  of  A.  Let  us  try  to  identify  this 
minimal  polynomial. 

Theorem  5.9  (Cayley-Hamilton  Theorem).  If  A  is  in  Mn (K)  and  if  F(X)  = 
det( X I  —  A)  is  its  characteristic  polynomial,  then  F{A)  =  0. 

PROOF.  Let  T  be  the  commutative  subring  of  Mn{ K)  generated  by  K I  and  A, 
and  define  a  member  B(X)  of  the  ring  7  j  X]  by  B(X)  =  XI  —  A.  The  (i,  j)th 
entry  of  B(X)  is  BU(X)  =  £X  -  Au,  and  F(X)  =  det  B(X). 

Let  C(X)  =  B(X )adj  denote  the  classical  adjoint  of  BIX)  as  a  member  of 
T[X];  the  form  of  C(X)  is  given  in  the  statement  of  Cramer’s  rule  (Proposition 
5.4),  and  that  proposition  says  that 

B(X)C(X)  =  (det  B(X))1  =  F(X)I. 

The  equality  in  the  (i,  /)lh  entry  is  the  equality  8jjF{X )  =  JT  fi,j  (X)Cj7  (X)  of 
members  of  K[X].  Application  of  the  substitution  homomorphism  X  A  gives 


8ijF{A)  =  £  Bik(A)Ckj(A )  =  £  (8ikA  -  AikI)Ckj(A). 

k  k 

Multiplying  on  the  right  by  the  1th  standard  basis  vector  e,  and  summing  on  i,  we 
obtain  the  equality  of  vectors 


F(A)eJ  =  £  £  (8ikAei  -  Aikei)Ckj{A)  =  £  Cy(A)(£  (8ikAei  -  A„e()) 

i  k  k  i 


since  Ckj(A )  is  a  scalar.  But  £;  ( 8ikAa  —  Alka)  =  Aek  —  £;  Aikei  =  0  for  all 
k ,  and  therefore  F(A)ej  =  0.  Since  j  is  arbitrary,  F(  A)  =0.  □ 
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Corollary  5.10.  If  A  is  in  Mn( K),  then  the  minimal  polynomial  of  A  divides 
the  characteristic  polynomial  of  A. 

PROOF.  Theorem  5.9  shows  that  the  characteristic  polynomial  of  A  lies  in 
the  ideal  of  all  polynomials  vanishing  on  A.  Then  the  corollary  follows  from 
Proposition  5.8.  □ 

For  our  matrix  A  in  Mn  (K),  let  F(X)  be  the  characteristic  polynomial,  and  let 
M{X )  be  the  minimal  polynomial.  By  unique  factorization  (Theorem  1.17),  the 
monic  polynomial  F(X)  has  a  factorization  into  powers  of  distinct  prime  monic 
polynomials  of  the  form 

F(X)  =  Pi(X)kl  ■  ■  ■  Pr(X)kf 

and  this  factorization  is  unique  up  to  the  order  of  the  factors.  Since  M(X)  is  a 
monic  polynomial  dividing  F(X),  we  must  have 

M(X)  =  Pi{X)h  ■  ■  ■  Pr(X)1' 

with  /|  <  k\, ...  ,lr  <  kr,  by  the  same  argument  that  deduced  Corollary  1.7  from 
unique  factorization  in  the  ring  of  integers.  We  shall  see  shortly  that  kj  >  0 
implies  >  0  if  Pj  (X)  is  of  degree  1,  i.e.,  if  Pj  (X)  is  of  the  form  X  —  A.q;  in  other 
words,  if  /,()  is  an  eigenvalue  of  A,  then  X  —  Ao  divides  its  minimal  polynomial. 
We  return  to  this  point  in  a  moment.  Problem  31  at  the  end  of  the  chapter  will 
address  the  same  question  when  Pj(X)  has  degree  >  1. 

Examples. 

(1)  In  the  2-by-2  case,  ^  °  j  has  minimal  polynomial  M(X)  =  X  —  c,  and 

^  1  j  has  M(X)  =  (X  —  c)2.  Both  matrices  have  characteristic  polynomial 
F(X)  =  (X  -  c)2. 

(2)  The  k-by-k  matrix 

r  1  0  ■■■  0  0 

0  c  1  ■■■0  0 


0  0  0  -cl 
0  0  0  ■■■  0  c 

with  c  in  every  diagonal  entry,  with  1  in  every  entry  just  above  the  diagonal,  and 
with  0  elsewhere  has  minimal  polynomial  M(X)  =  (X  —  c)k  and  characteristic 
polynomial  F{X)  =  (X  —  c)k . 

(3)  If  a  matrix  A  is  made  up  exclusively  of  several  blocks  of  the  type  in 
Example  2  with  the  same  c  in  each  case,  the  zth  block  being  of  size  /q  ,  then  the 
minimal  polynomial  is  M(X)  =  (X  —  c) max' ki ,  and  the  characteristic  polynomial 
is  F(X)  =  (X  -c)£ 
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(4)  If  A  is  made  up  exclusively  of  several  blocks  as  in  Example  3  but  with  c 
different  for  each  block,  then  the  minimal  and  characteristic  polynomials  for  A 
are  obtained  by  multiplying  the  minimal  and  characteristic  polynomials  obtained 
from  Example  3  for  the  various  c’s. 

To  proceed  further,  let  us  change  our  point  of  view,  working  with  linear 
maps  L  :  V  — »■  V,  where  V  is  a  finite-dimensional  vector  space  over  K.  We 
have  already  defined  the  characteristic  polynomial  of  L  to  be  the  characteristic 
polynomial  of  the  matrix  of  L  in  any  ordered  basis;  this  is  well  defined  because 
similar  matrices  have  the  same  characteristic  polynomial.  In  analogous  fashion 
we  can  define  the  minimal  polynomial  of  L  to  be  the  minimal  polynomial  of  the 
matrix  of  L  in  any  ordered  basis;  this  is  well  defined  since,  as  we  have  seen,  the 
set  of  polynomials  P  in  one  indeterminate  with  P(A)  =  0  is  the  same  as  the  set 
with  P{C~l  AC)  =  0  if  C  is  invertible. 

Another  way  of  approaching  the  matter  of  the  minimal  polynomial  of  L  is  to 
define  P(L)  for  any  polynomial  P  in  one  indeterminate.  As  with  matrices,  we 
can  define  P(L)  either  concretely  by  substituting  L  for  X  in  the  expression  for 
P(X),  or  we  can  define  P(L)  abstractly  by  appealing  to  the  universal  mapping 
property  in  Proposition  4.24.  For  the  latter  we  work  with  the  subring  T  of  linear 
maps  from  V  to  itself  generated  by  K7  and  L.  This  subring  is  commutative.  We 
let  <p  :  K  — »■  T  be  given  by  (pic)  =  cl,  and  we  use  Proposition  4.24  to  obtain  the 
unique  ring  homomorphism  <t>  :  K[A]  — »■  T  such  that  <b(c)  =  cl  for  all  c  e  K 
and  <b(X)  =  L.  Then  P(L)  is  the  element  OfP)  of  T .  Once  P{L)  is  defined, 
we  observe  that  the  set  of  polynomials  P(X)  such  that  P(L)  =  0  is  a  nonzero 
ideal  in  K[X] ;  Proposition  5.8  yields  a  unique  monic  polynomial  of  lowest  degree 
in  this  ideal,  and  that  is  the  minimal  polynomial  of  L. 

Linear  maps  enable  us  to  make  convenient  use  of  invariant  subspaces.  Recall 
from  earlier  in  the  section  that  a  vector  subspace  U  of  V  is  said  to  be  invariant 
under  the  linear  map  L  :  V  — >  V  if  7,(77)  C  U;  in  this  case  we  obtain  associated 
linear  maps  L\u  :  U  — »■  U  and  L  :  V/U  — »■  V/U.  Relationships  among 
the  characteristic  polynomials  and  minimal  polynomials  of  these  linear  maps  are 
given  in  the  next  two  propositions. 

Proposition  5.11.  Let  V  be  a  finite-dimensional  vector  space  over  K,  let 
L  :  V  — >  V  be  linear,  let  U  be  a  proper  nonzero  invariant  subspace  under  L,  and 
let  L  :  V/U  — >  V /  U  be  the  induced  linear  map  on  V /  U .  Then  the  characteristic 
polynomials  of  L,  L  |  f/ ,  and  L  are  related  by 

det (XI  -L)  =  det  (XI  -  L|f/)det(Z7  —  L). 

PROOF.  Let  Ty  =  (m, . . . ,  Vk)  be  an  ordered  basis  of  U ,  and  extend  T u  to 
an  ordered  basis  T  =  (iq, . . . ,  v„)  of  V.  Then  T  =  (iq+t  +  U, ...  ,v„  +  U) 
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is  an  ordered  basis  of  V/U.  Since  U  is  invariant  under  L,  the  matrix  of  L  in 
the  ordered  basis  V  is  of  the  form  f  ^  ^  Y  where  A  is  the  matrix  of  L  |  (/  in  the 

ordered  basis  T y  and  D  is  the  matrix  of  L  in  the  ordered  basis  F.  Passing  to  the 
characteristic  polynomials  and  applying  Proposition  5.1g,  we  obtain  the  desired 
conclusion.  □ 

Proposition  5.12.  Let  V  be  a  finite-dimensional  vector  space  over  K,  let 
L  :  V  — >  V  be  linear,  let  U  be  a  proper  nonzero  invariant  subspace  under  L,  and 
let  L  :  V/U  — >  V/U  be  the  induced  linear  map  on  V/U.  Then  the  minimal 
polynomials  of  L  |  and  L  divide  the  minimal  polynomial  of  L. 

PROOF.  Let  N(X )  be  the  minimal  polynomial  of  L  |  (: .  Then  N (X)  is  the 
unique  monic  polynomial  of  lowest  degree  in  the  ideal  of  all  polynomials  P{X ) 
such  that  P{L)u  =  0  for  all  u  in  U .  The  minimal  polynomial  M(X)  of  L  has 
this  property  because  M(X)v  =  0  for  all  v  in  V.  Therefore  M(X)  is  in  the  ideal 
and  is  the  product  of  N(X )  and  some  other  polynomial. 

Among  linear  maps  S  from  V  into  V  carrying  U  into  itself,  the  function  S  f->  S 
sending  S  to  the  linear  map  S  induced  on  V/U  is  a  homomorphism  of  rings.  It 
follows  that  if  P(X)  is  a  polynomial  with  P(L)  =  0,  then  P ( L )  =  0.  Taking 
P(X)  to  be  the  minimal  polynomial  of  L,  we  see  that  the  minimal  polynomial  of 
L  is  in  the  ideal  of  polynomials  vanishing  on  L.  Therefore  it  is  the  product  of  the 
minimal  polynomial  of  L  and  some  other  polynomial.  □ 

Let  us  come  back  to  the  unproved  assertion  before  the  examples— that  kj  >  0 
implies  >  0  if  P,  (  X )  has  degree  1.  We  prove  the  linear-function  version  of 
this  statement  as  a  corollary  of  Proposition  5.12. 

Corollary  5.13.  If  L  :  V  —>  V  is  linear  on  a  finite-dimensional  vector 
space  over  K  and  if  a  first-degree  polynomial  X  —  ao  divides  the  characteristic 
polynomial  of  L,  then  X  —  Ao  divides  the  minimal  polynomial  of  L. 

PROOF.  If  X — Ao  divides  the  characteristic  polynomial,  then  ao  is  an  eigenvalue 
of  L,  say  with  v  as  an  eigenvector.  Then  U  =  is  an  invariant  subspace  under 
L,  and  the  characteristic  and  minimal  polynomials  of  L\  are  both  X  —  Xq.  By 
Proposition  5.12,  X  —  Xq  divides  the  minimal  polynomial  of  L.  □ 

Theorem  5.14.  If  L  :  V  —>■  V  is  linear  on  a  finite-dimensional  vector  space 
over  K,  then  L  has  a  basis  of  eigenvectors  if  and  only  if  the  minimal  polynomial 
M{X)  of  L  is  the  product  of  distinct  factors  of  degree  1 ;  in  this  case,  M(X)  equals 
(X  —  A.i)  •  ■  •  (X  —  Xk),  where  Ai, . . . ,  A.*  are  the  distinct  eigenvalues  of  L.  Con¬ 
sequently  a  matrix  A  in  M„  (K)  is  similar  to  a  diagonal  matrix  if  and  only  if  its 
minimal  polynomial  is  the  product  of  distinct  factors  of  degree  1 . 
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PROOF.  The  easy  direction  is  that  v\, ...  ,vn  are  the  members  of  a  basis 
of  eigenvectors  for  L  with  respective  eigenvalues  H\.. . . ,  jin .  In  this  case,  let 
A.i, . . . ,  A*  be  the  distinct  members  of  the  set  of  eigenvalues,  with  /i,  =  A for 

some  function  j  :  { 1 . n}  — >  {1, . . . ,  k}.  Then  (L  —  A;/)( v)  =  0  for  v  equal 

to  any  v,  with  j(i)  =  j.  Since  the  linear  maps  L  —  )./l  commute  as  j  varies, 
]~[y=1  (L—Xj  I  )(v)  =  0  for  v  equal  to  each  of  v\,  . . . ,  v„,  hence  for  all  v.  Therefore 
the  minimal  polynomial  M(X)  of  L  divides  ]~[/=1  (X  —  Xj).  On  the  other  hand. 
Corollary  5.13  shows  that  the  deg  M(X)  >  k.  Hence  M( X )  =  ]~[*=i  (X  —  Xj). 

Conversely  suppose  that  M(X)  =  ]""[*_  j  (X  —  Xj)  with  the  Xj  distinct.  If  S\ 
is  the  linear  map  Si  =  [~[j=2  —  A/  /),  then  the  formula  for  M(X)  shows  that 

{L  —  A]  1)S\  (i>)  =  0  for  all  v  in  V.  and  hence  image  Si  is  a  vector  subspace  of  the 
eigenspace  of  L  for  the  eigenvalue  Ai .  If  v  is  in  ker  Si  Cl  image  Si ,  we  then  have 
0  =  Si(v)  =  ]~[^=2  (L  —  X jl)(v)  =  UU  (Xi  —  Xj)v.  Since  Ai  is  distinct  from 
X2, . . . ,  A*,  we  conclude  that  v  =  0,  hence  that  ker  Si  Cl  image  Si  =  0.  Since 
dimker  Si  +  dim  image  Si  =  dim  V,  Corollary  2.29  therefore  gives 

dim  V  =  dimker  Si  +  dim  image  Si 

=  dim(ker  Si  +  image  Si)  +  dimtker  Si  0  image  Si) 

=  dim(ker  Si  +  image  Si). 

Hence  V  =  ker  Si  +  image  Si .  Since  ker  Si  Cl  image  Si  =  0,  we  conclude  that 
V  =  ker  Si  ©  image  Si . 

Actually,  the  same  calculation  of  Si(t>)  as  above  shows  that  image  Si  is  the 
full  eigenspace  of  L  for  the  eigenvalue  A] .  In  fact,  if  L ( v )  =  A 1  v ,  then  Si  (v)  = 
UU  i X 1  —  Xj)v,  and  hence  v  equals  the  image  under  Si  of  ( ]”[?=  2  (Xi  —  Xj))  1  v. 

Next,  since  L  commutes  with  Si ,  ker  Si  is  an  invariant  subspace  under  L,  and 
Ai  is  not  an  eigenvalue  of  L|kerS  .  Thus  X  —  X\  does  not  divide  the  minimal 
polynomial  of  ^|kcr  s  •  On  the  other  hand.  Si  vanishes  on  the  eigenspaces  of 
L  for  eigenvalues  A2, . . . ,  A*,  and  Corollary  5.13  shows  for  j  >  2  that  X  —  Xj 
divides  the  minimal  polynomial  of  L  |  .  Taking  Proposition  5.12  into  account, 

we  conclude  that  C|kcrV|  has  minimal  polynomial  n/=2  (X  ~  Xj).  We  have 
succeeded  in  splitting  off  the  eigenspace  of  L  under  Ai  as  a  direct  summand  and 
reducing  the  proposition  to  the  case  of  k  —  1  eigenvalues.  Thus  induction  shows 
that  V  is  the  direct  sum  of  its  eigenspaces  for  the  eigenvalues  A2, . . . ,  A*,  and  L 
thus  has  a  basis  of  eigenvectors.  □ 

Theorem  5.14  comes  close  to  solving  the  canonical-form  problem  for  similarity 
in  the  case  of  one  kind  of  square  matrices:  if  the  minimal  polynomial  of  A  is  the 
product  of  distinct  factors  of  degree  1 ,  then  A  is  similar  to  a  diagonal  matrix.  To 
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complete  the  solution  for  this  case,  all  we  have  to  do  is  to  say  when  two  diagonal 
matrices  are  similar  to  each  other;  this  step  is  handled  by  the  following  easy 
proposition. 

Proposition  5.15.  Two  diagonal  matrices  A  and  A'  in  M„( K)  with  respective 
diagonal  entries  d\ ,  . . . ,  dn  and  d[, ...  ,d'n  are  similar  if  and  only  if  there  is  a 
permutation  a  in  &„  such  that  t/j  =  d„(j)  for  all  j. 

PROOF.  The  respective  characteristic  polynomials  are  1  (X  ~  dj)  and 
]~["=1  (X  —  dj).  If  A  and  A'  are  similar,  then  the  characteristic  polynomials  are 
equal,  and  unique  factorization  (Theorem  1.17)  shows  that  the  factors  X  —  d'- 
match  the  factors  X  —  dj  up  to  order.  Conversely  if  there  is  a  permutation  a  in 
&n  such  that  dj  =  d„(j)  for  all  j,  then  the  matrix  C  whose  /lh  column  is  en (j ,  has 
the  property  that  A'  =  C-1  AC.  □ 

To  proceed  further  with  obtaining  canonical  forms  for  matrices  under  similarity 
and  for  linear  maps  under  isomorphism,  we  shall  use  linear  maps  in  ways  that 
we  have  not  used  them  before.  In  particular,  it  will  be  convenient  to  be  able  to 
recognize  direct-sum  decompositions  from  properties  of  linear  maps.  We  take  up 
this  matter  in  the  next  section. 


4.  Projection  Operators 

In  this  section  we  shall  see  how  to  recognize  direct-sum  decompositions  of  a 
vector  space  V  from  the  associated  projection  operators,  and  we  shall  relate  these 
operators  to  invariant  subspaces  under  a  linear  map  L  :  V  — »■  V . 

If  V  =  U\  ©  Ui,  then  the  function  £j  defined  by  £j(«i  +  ut)  =  u\  when  u  \ 
is  in  U i  and  uo  is  in  Un  is  linear,  satisfies  E\  =  £j,  and  has  image  E\  =  U\  and 
ker  E\  =  U2.  We  call  £j  the  projection  of  V  on  U 1  along  U2.  A  decomposition 
of  V  as  the  direct  sum  of  two  vector  spaces,  when  the  first  of  the  two  spaces  is 
singled  out,  therefore  determines  a  projection  operator  uniquely.  A  converse  is 
as  follows. 

Proposition  5.16.  If  V  is  a  vector  space  and  E\  :  V  —>■  V  is  a  linear  map  such 
that  E\  =  £j,  then  there  exists  a  direct-sum  decomposition  V  =  U\  ©  U2  such 
that  E\  is  the  projection  of  V  on  JJ\  along  1/2-  In  this  case,  (/  —  £j)2  =  /  —  £j, 
and  /  —  E\  is  the  projection  of  V  on  Un  along  U\ . 

PROOF.  Define  U 1  =  image  E 1  and  U2  =  ker  E\.  If  v  is  in  image  E \  D  ker  E 1 , 
then  E\(v)  =  0  since  v  is  in  ker£i  and  v  =  £j(u>)  for  some  w  in  V  since 
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v  is  in  image E\.  Then  0  =  E\(v)  =  E\(w)  =  E\(w)  =  v,  and  therefore 
image  E\  fl  ker  E \  =  0. 

If  v  e  V  is  given ,  write  v  =  E\(v)  +  (7  —  £j)(t>).  Then  E\ (v)  is  in  image  E\, 
and  the  computation  E\ (7  —  Ei)(v)  =  (E\  —  E\)(v)  =  (E\  —  E\)(v)  =  0  shows 
that  (/  —  E\)(v)  =  0.  Consequently  V  =  image  E\  +  ker  E \ ,  and  we  conclude 
that  V  =  image  E\  ©  ker  E\ . 

Hence  V  =  U\  ©  t7>>  where  U \  =  image  E\  and  lh_  =  ker  E\.  In  this 
notation,  E\  is  0  on  {©.  If  u  is  in  U\,  then  v  =  E\(w)  for  some  w,  and  we  have 
v  =  E\(w)  =  E\{w)  =  E\(E\{w))  =  E\(v).  Thus  E\  is  the  identity  on  U\  and 
is  the  projection  as  asserted. 

For  (/  —  7?i)2,  we  have  (/  —  E\)2  =  I-2El  +  E\  =  I-2El+E1  =  I  -  Eu 
and  /  —  Ei  is  a  projection.  It  is  1  on  Ui  and  is  0  on  U\,  hence  is  the  projection 
of  V  on  U2  along  U\.  □ 

Let  us  generalize  these  considerations  to  the  situation  that  V  is  the  direct  sum 
of  r  vector  subspaces.  The  following  facts  about  the  situation  in  Proposition  5.16, 
with  the  definition  E2  =  I  —  E\,  are  relevant  to  formulating  the  generalization: 

(i)  E\  and  Ei  have  E\  =  E\  and  E\  =  /©■ 

(ii)  ExE2  =  E2Ei  =  0, 

(iii)  Ex+  E2  =  I. 

Suppose  that  V  =  U\  ©  •  •  •  ©  Ur.  Define  Ej(u\  +  ---  +  ur)  =  uj.  Then  Ej 
is  linear  from  V  to  itself  with  Ej  =  Ej,  and  Proposition  5.16  shows  that  Ej  is 
the  projection  of  V  on  Uj  along  the  direct  sum  of  the  remaining  Uj ’s.  The  linear 
maps  E\, . . . ,  Er  then  satisfy 

(i'j  Ej  =  Ej  for  1  <  j  <r, 

(ii  )  EjE,  =  0  if  i  / 

(iii')  Ei  H - h  E,  =  I . 

A  converse  is  as  follows. 

Proposition  5.17.  If  V  is  a  vector  space  and  Ej  :  V  — >  V  for  1  <  j  <  r  are 
linear  maps  such  that 

(a)  Ej  Ej  =  0  if  i  ^  j,  and 

(b)  £,  +  •••  +  £,  =  I, 

then  Ej  =  Ej  for  1  <  j  <  r  and  the  vector  subspaces  Uj  =  image  Ej  have  the 
properties  that  V  =  U\  ©  •  •  •  ©  Ur  and  that  Ej  is  the  projection  of  V  on  Uj  along 
the  direct  sum  of  all  U,  but  Uj . 

PROOF.  Multiplying  (b)  through  by  Ej  on  the  left  and  applying  (a)  to  each 
term  on  the  left  side  except  the  /th,  we  obtain  Ej  =  Ej.  Therefore,  for  each  j, 
Ej  is  a  projection  on  Uj  along  some  vector  subspace  depending  on  j. 
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If  v  is  in  V ,  then  (b)  gives  v  =  E i(u)  +  •  •  •  +  Er(v )  and  shows  that  V  = 
U\  +  ■  ■  ■  +  Ur.  Suppose  that  v  is  in  the  intersection  of  Uj  with  the  sum  of  the 
other  Uj’s.  Write  v  =  -  •  u,  with  «,  =  £j(uy)  in  Iff.  Applying  Ej  and  using 

the  fact  that  v  is  in  Uj,  we  obtain  v  =  Ej(v)  =  ■  EjEj(wj).  Every  term  of 

the  right  side  is  0  by  (a),  and  hence  v  =  0.  Thus  V  =  U\  ©  •  •  •  ©  Ur. 

Since  £’/£',  =  0  for  i  f=-  j,  Ej  is  0  on  each  Uj  for  i  f=-  j.  Therefore  the  sum  of 
all  Uj  except  Uj  is  contained  in  the  kernel  of  Ej .  Since  the  image  and  kernel  of 
Ej  intersect  in  0,  the  sum  of  all  Uj  except  Uj  is  exactly  equal  to  the  kernel  of  Ej. 
This  completes  the  proof.  □ 

Proposition  5.18.  Suppose  that  a  vector  space  V  is  a  direct  sum  V  = 
U\  ©■■■©£/,-  of  vector  subspaces,  that  £j , . . . ,  E,  are  the  corresponding  pro¬ 
jections,  and  that  L  :  V  — >  V  is  linear.  Then  all  the  subspaces  Uj  are  invariant 
under  L  if  and  only  if  LEj  =  EjL  for  all  j . 

Proof.  If  L{JJj )  c  Uj  for  all  j,  then  i  =f  j  implies  EjL(JJj)  c  E,  (Uj )  =  0 
and  LEj{Uj)  =  L( 0)  =  0.  Also,  v  e  Uj  implies  EjL( v)  =  L( v)  =  LEj{v). 
Hence  EjL  =  EjL  for  all  i. 

Conversely  if  EjL  =  LEj  and  if  v  is  in  Uj,  then  EjL(v)  =  L Ej ( v )  =  L(v) 
shows  that  L(v)  is  in  Uj.  Therefore  L(Uj)  C  Uj  for  all  j.  □ 


5.  Primary  Decomposition 

For  the  case  that  the  minimal  polynomial  of  a  linear  map  L  :  V  — >  V  is  the  product 
of  distinct  factors  of  degree  1,  Theorem  5.14  showed  that  V  is  a  direct  sum  of  its 
eigenspaces.  The  proof  used  elementary  vector-space  techniques  from  Chapter 
II  but  did  not  take  full  advantage  of  the  machinery  developed  in  the  present 
chapter  for  passing  back  and  forth  between  polynomials  in  one  indeterminate 
and  the  values  of  polynomials  on  L.  Let  us  therefore  rework  the  proof  of  that 
proposition,  taking  into  account  the  discussion  of  projections  in  Section  4. 

We  seek  an  eigenspace  decomposition  V  =  Vjq  ©  •  •  •  ©  V-M  relative  to  L. 
Proposition  5.17  suggests  looking  for  the  corresponding  decomposition  of  the 
identity  operator  as  a  sum  of  projections:  I  =  E\  +  •  •  •  +  £j..  According  to  that 
proposition,  we  obtain  a  direct-sum  decomposition  as  soon  as  we  obtain  this  kind 
of  sum  of  linear  maps  such  that  C,  Ej  =  0  for  i  f=-  j.  The  Ej' s  will  automatically 
be  projections. 

The  proof  of  Theorem  5. 14  showed  that  5j  =  \\j=2  (T  — k//)  has  image  equal 
to  the  kernel  of  L  —  X 1 1 ,  i.e.,  equal  to  the  eigenspace  for  eigenvalue  A].  If  v 
is  in  this  eigenspace,  then  Sj(u)  =  n/=2  (^t  —  A/ )  v .  Hence  E\  =  cj  .Sj ,  where 
cj-1  =  n;=2(*i  —  A')-  The  linear  map  Sj  equals  Q\(L),  where  Qi(X )  = 
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UU(X  —  k.i ) ■  Thus  E i  =  c i  Qi(L).  Similar  remarks  apply  to  the  other 
eigenspaces,  and  therefore  the  required  decomposition  of  the  identity  operator 
has  to  be  of  the  form  I  =  c\  Qi{L)  +  •  •  •  +  CkQk(E)  with  c\, ...  ,Ck  equal  to 
certain  scalars. 

The  polynomials  Q\{X), . . . ,  Qi{X )  are  at  hand  from  the  start,  each  containing 
all  but  one  factor  of  the  minimal  polynomial.  Moreover,  i  ^  j  implies  that 

k 

Qi(L)Qj(L)  =  (n<£-v>)(n  (L-V)). 

1= 1 

The  first  factor  on  the  right  side  is  the  value  of  the  minimal  polynomial  of  L  with 
L  substituted  for  X.  Hence  the  right  side  is  0,  and  we  see  that  our  linear  maps 
Ei, ....  Ek  have  E,Ej  =  0  for  i  /  j. 

As  soon  as  we  allow  nonconstant  coefficients  in  place  of  the  c; ’s  in  the  above 
argument,  we  obtain  a  generalization  of  Theorem  5.14  to  the  situation  that  the 
minimal  polynomial  of  L  is  arbitrary.  The  prime  factors  of  the  minimal  polyno¬ 
mial  need  not  even  be  of  degree  1 .  Hence  the  theorem  applies  to  all  L ’s  even  if 
K  is  not  algebraically  closed. 

Theorem  5.19  (Primary  Decomposition  Theorem).  Let  L  :  V  — V  be  linear 
on  a  finite-dimensional  vector  space  over  K,  and  let  M(X)  =  P\{X)lx  ■  ■  ■  Pk(X)lk 
be  the  unique  factorization  of  the  minimal  polynomial  M  ( X )  of  L  into  the  product 
of  powers  of  distinct  monic  prime  polynomials  Pj(X).  Define  Uj  =  kcr(  Pj(L)1' ) 
for  1  <  j  <k.  Then 

(a)  V  =  [/!©•••©  Uk, 

(b)  the  projection  Ej  of  V  on  Uj  along  the  sum  of  the  other  Uj ’s  is  of  the 
form  Tj ( L )  for  some  polynomial  7), 

(c)  each  vector  subspace  Uj  is  invariant  under  L, 

(d)  any  linear  map  from  V  to  itself  that  commutes  with  L  carries  each  Uj 
into  itself, 

(e)  any  vector  subspace  W  invariant  under  L  has  the  property  that 


w  =  (W  n  Ui)  ®  •  •  •  ©  (W  n  uk), 

(f)  the  minimal  polynomial  of  L,  =  L\(J  is  Pj{X)l> . 

Remarks.  The  decomposition  in  (a)  is  called  the  primary  decomposition  of 
V  under  L,  and  the  vector  subspaces  Uj  are  called  the  primary  subspaces  of  V 
under  L. 

PROOF.  For  1  <  j  <  k,  define  Qj(X)  =  M(X) / Pj(X)h .  The  ideal  in 
K[X]  generated  by  Q\(X), . . . ,  Qk  ( X)  consists  of  all  products  of  a  single  monic 
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polynomial  D(X)  by  arbitrary  polynomials,  according  to  Proposition  5.8,  and 
D(X)  has  to  divide  each  Qj(X).  Since  Qj(X )  =  Y\i^j  Pi(X)1',  D(X)  cannot 
be  divisible  by  any  Pj(X),  and  consequently  D(X )  =  1.  Thus  there  exist 
polynomials  R\(X), . . . ,  Rk(X)  such  that 

1  =  <2i(X)^(X)  +  •  •  •  +  Qk(X)Rk(X). 

Define  Ej  =  Qj(L)Rj(L),  so  that  E\  +  •  •  •  +  Ek  =  I.  If  i  ^  j,  then 
Qi(X)Qj(X)  =  M(X)  n r#j  Pr(X)lr.  Since  M(L)  =  0,  we  see  that  Ej Ej  =  0. 

Proposition  5.17  says  that  each  Ej  is  a  projection.  Also,  it  says  that  if  Uj 
denotes  image  Ej,  then  V  =  JJ\  ©  •  •  •  ©  Uk,  and  Ej  is  the  projection  on  Uj  along 
the  sum  of  the  other  (/,- ’s.  With  this  definition  of  the  Uj ’s  (rather  than  the  one  in 
the  statement  of  the  theorem),  we  have  therefore  shown  that  (a)  and  (b)  hold. 

Let  us  see  that  conclusions  (c),  (d),  and  (e)  follow  from  (b).  Conclusion 
(c)  holds  by  Proposition  5.18  since  L  commutes  with  7’,  ( L)  whenever  7)  is  a 
polynomial.  For  (d),  if  J  :  V  — >  V  is  a  linear  map  commuting  with  L,  then 
J  commutes  with  each  Ej  since  (b)  shows  that  each  Ej  is  of  the  form  7)(L). 
From  Proposition  5.18  we  conclude  that  each  Uj  is  invariant  under  J .  For  (e), 
the  subspace  W  certainly  contains  (W  D  U\)  ©  •  •  •  ©  (W  D  74).  For  the  reverse 
containment  suppose  w  is  in  W.  Since  Ej  is  of  the  form  7)(L)  and  since  W 
is  invariant  under  L,  Ej(w)  is  in  W .  But  also  Ej(w)  is  in  Uj.  Therefore  the 
expansion  w  =  Ej(w)  exhibits  w  as  the  sum  of  members  of  the  spaces 
wnUj. 

Next  let  us  prove  that  Uj,  as  we  have  defined  it,  is  given  also  by  the  definition 
in  the  statement  of  the  theorem.  In  other  words,  let  us  prove  that 

image  Ej  =  ker(Pj(L)lj).  (*) 

We  need  a  preliminary  fact.  The  polynomial  Pj(X)li  has  the  property  that 
M(X)  =  Pj(X)lJ  Qj(X).  Hence  Pj(L)lJQj(L )  =  M(L)  =  0.  Multiplying 
by  Rj(L),  we  obtain 

Pj(L)l‘Ej  =  0.  (**) 

Now  suppose  that  v  is  in  image  £).  Then  Pj(L)l/  (v)  =  Pj(L)ljEj(v)  =  0 
by  (**),  and  hence  image  Ej  C  kcr(/J; ( L)lj )■  For  the  reverse  inclusion,  let  v  be 
inker  (Pj(L)lJ).  For  i  ^  j,Qi{X)Ri(X)  =  Pr(X^)Ri(X)Pj(X)lJ  and 

hence 

Ej(v)  =  (n T#,j  Pr(L)l/)Rl(L)Pj(Lyi(v)  =  0. 

Writing  v  =  E\{v)  +  •  •  •  +  Ek(v),  we  see  that  v  =  Ej(v).  Thus  ker (Pj(L)lJ)  C 
image  Ej.  Therefore  (*)  holds,  and  Uj  is  as  in  the  statement  of  the  theorem. 
Finally  let  us  prove  (f).  Let  M/  ( X )  be  the  minimal  polynomial  of  L  j  =  L  | 

From  (**)  we  see  that  Pj(Lj)l>  =  0.  Hence  Mj(X)  divides  Pj(X)lU  For  the 
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reverse  divisibility  we  have  Mj ( Lj )  =  0.  Then  certainly  Mj(Lj)Qj(Lj)Rj(Lj), 
which  equals  Mj(L)Ej  on  Uj,  is  0  on  Uj.  Consider  Mj(L)Ej  on  U,  =  image  £, 
when  i  ^  j.  Since  EjEj  =  0,  Mj(L)Ej  equals  0  on  all  U,  other  than  Uj.  We 
conclude  that  Mj  (L)Ej  equals  0  on  V,i.e.,  Mj(L)Qj(L)Rj(L)  =  0.  Since  M(X) 
is  the  minimal  polynomial  of  L.  M(X)  divides 


Mj(X)Qj{X)Rj{X )  =  Mj(X)(  1  -  £  Qi(X)Ri(X)),  (t) 

»¥./ 

and  the  factor  Pj(X)'i  of  M(X)  must  divide  the  right  side  of  (f).  On  that  right 
side,  Pj{X)'j  divides  each  Qi(X)  with  i  ^  j.  Since  Pj(X )  does  not  divide  1, 
Pj(X)  does  not  divide  the  factor  1  —  Gi(X) Rj(X).  Since  Pj(X)  is  prime, 
Pj(X)'j  and  1  —  .  Qi{X)Rj{X)  are  relatively  prime.  We  know  that  Pj(X)li 

divides  the  product  of  Mj(X)  and  1  —  ,  ■  Qj(X)Rj{X),  and  consequently 

Pj(X)li  divides  M j (X).  This  proves  the  reverse  divisibility  and  completes  the 
proof  of  (f).  □ 


6.  Jordan  Canonical  Form 

Now  we  can  return  to  the  canonical-form  problem  for  similarity  of  square  matrices 
and  isomorphism  of  linear  maps  from  a  finite-dimensional  vector  space  to  itself. 
The  answer  obtained  in  this  section  will  solve  the  problem  completely  if  IK 
is  algebraically  closed  but  only  partially  if  IK  fails  to  be  algebraically  closed. 
Problems  32-40  at  the  end  of  the  chapter  extend  the  content  of  this  section  to  give 
a  complete  answer  for  general  IK. 

The  present  theorem  is  most  easily  stated  in  terms  of  matrices.  A  square  matrix 
is  called  a  Jordan  block  if  it  is  of  the  form 

(c  1  0  0  •••  0  0\ 

cl  0  •••  0  0 

c  1  •••  0  0 

c  1  0 

c  1 

V  c) 

of  some  size  and  for  some  c  in  IK,  as  in  Example  2  of  Section  3,  with  0  everywhere 
below  the  diagonal.  A  square  matrix  is  in  Jordan  form,  or  Jordan  normal  form, 
if  it  is  block  diagonal  and  each  block  is  a  Jordan  block.  One  can  insist  on  grouping 
the  blocks  for  which  the  constant  c  is  the  same  and  arranging  the  blocks  for  given 
c  in  some  order,  but  these  refinements  are  inessential. 
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Theorem  5.20  (Jordan  canonical  form). 

(a)  If  the  field  K  is  algebraically  closed,  then  every  square  matrix  over  K  is 
similar  to  a  matrix  in  Jordan  form,  and  two  matrices  in  Jordan  form  are  similar 
to  each  other  if  and  only  if  their  Jordan  blocks  can  be  permuted  so  as  to  match 
exactly. 

(b)  For  a  general  field  K,  a  square  matrix  A  is  similar  to  a  matrix  in  Jordan 
form  if  and  only  if  each  prime  factor  of  its  minimal  polynomial  has  degree  1. 
Two  matrices  in  Jordan  form  are  similar  to  each  other  if  and  only  if  their  Jordan 
blocks  can  be  permuted  so  as  to  match  exactly. 

The  first  step  in  proving  existence  of  a  matrix  in  Jordan  form  similar  to  a 
given  matrix  is  to  use  the  Primary  Decomposition  Theorem  (Theorem  5.19).  We 
think  of  the  matrix  A  as  operating  on  the  space  K"  of  column  vectors  in  the 
usual  way.  The  primary  subspaces  are  uniquely  defined  vector  subspaces  of  K", 
and  we  introduce  an  ordered  basis,  yet  to  be  specified  in  full  detail,  within  each 
primary  subspace.  The  union  of  these  ordered  bases  gives  an  ordered  basis  of 
K",  and  we  change  from  the  standard  basis  to  this  one.  The  result  is  that  the 
given  matrix  has  been  conjugated  so  that  its  appearance  is  block  diagonal,  each 
block  having  minimal  polynomial  equal  to  a  power  of  a  prime  polynomial  and  the 
prime  polynomials  all  being  different.  Let  us  call  these  blocks  primary  blocks. 
The  effect  of  Theorem  5.19  has  been  to  reduce  matters  to  a  consideration  of  each 
primary  block  separately.  The  hypothesis  either  that  K  is  algebraically  closed 
or,  more  generally,  that  the  prime  divisors  of  the  minimal  polynomial  all  have 
degree  1  means  that  the  minimal  polynomial  of  the  primary  block  under  study 
may  be  taken  to  be  (X  —  c)1  for  some  c  in  K  and  some  integer  /  >  1 .  In  terms 
of  Jordan  form,  we  have  isolated,  for  each  c  in  K,  what  will  turn  out  to  be  the 
subspace  of  K"  corresponding  to  Jordan  blocks  with  c  in  every  diagonal  entry. 

Let  us  write  B  for  a  primary  block  with  minimal  polynomial  (X  —  c)1 .  We 
certainly  have  ( B  —  cl)1  =  0,  and  it  follows  that  the  matrix  N  =  B  —  cl  has 
N1  =  0.  A  matrix  N  with  N1  =  0  for  some  integer  /  >  0  is  said  to  be  nilpotent. 
To  prove  the  existence  part  of  Theorem  5.20,  it  is  enough  to  prove  the  following 
theorem. 

Theorem  5.21.  For  any  field  K,  each  nilpotent  matrix  N  in  Mn( K)  is  similar 
to  a  matrix  in  Jordan  form. 

The  proof  of  Theorem  5.21  and  of  the  uniqueness  statements  in  Theorem 
5.20  will  occupy  the  remainder  of  this  section.  It  is  implicit  in  Theorem  5.21 
that  a  nilpotent  matrix  in  Mn  (K)  has  0  as  a  root  of  its  characteristic  polynomial 
with  multiplicity  n,  in  particular  that  the  only  prime  polynomials  dividing  the 
characteristic  polynomial  are  the  ones  dividing  the  minimal  polynomial.  We 
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proved  such  a  fact  about  divisibility  earlier  for  general  square  matrices  when  the 
prime  factor  has  degree  1,  but  we  did  not  give  a  proof  for  general  degree.  We 
pause  for  a  moment  to  give  a  direct  proof  in  the  nilpotent  case. 


Lemma  5.22.  If  N  is  a  nilpotent  matrix  in  Mn(K),  then  N  has  characteristic 
polynomial  X"  and  satisfies  N"  =  0. 

PROOF.  If  N1  =  0,  then 

(XI-N)(X'-lI  +  X'-2N^ - X2  N'~3  +  X  Nl~2  +  N'~l)  =  X'l-N1  =  X'l. 

Taking  determinants  and  using  Proposition  5.1  in  the  ring  R  =  K[X],  we  obtain 
det(X7  -  TV)  det (other  factor)  =  det(X'/)  =  Xln . 


Thus  det(X7  —  N )  divides  Xln .  By  unique  factorization  in  K[X],  det(X7  —  N)  is 
a  constant  times  a  power  of  X.  Then  we  must  have  det(X7— TV)  =  X" .  Applying 
the  Cayley-Hamilton  Theorem  (Theorem  5.9),  we  obtain  N"  =0.  □ 


Let  us  now  prove  the  uniqueness  statements  in  Theorem  5.20;  this  step  will  in 
fact  help  orient  us  for  the  proof  of  Theorem  5.21 .  In  (b),  one  thing  we  are  to  prove 
is  that  if  A  is  similar  to  a  matrix  in  Jordan  form,  then  every  prime  polynomial 
dividing  the  minimal  polynomial  has  degree  1 .  Since  characteristic  and  minimal 
polynomials  are  unchanged  under  similarity,  we  may  assume  that  A  is  itself  in 
Jordan  form.  The  characteristic  and  minimal  polynomials  of  A  are  computed  in 
the  four  examples  of  Section  3.  Since  the  minimal  polynomial  is  the  product  of 
polynomials  of  degree  1 ,  the  only  primes  dividing  it  have  degree  1 . 

In  both  (a)  and  (b)  of  Theorem  5.20,  we  are  to  prove  that  the  Jordan  form 
is  unique  up  to  permutation  of  the  Jordan  blocks.  The  matrix  A  determines 
its  characteristic  polynomial,  which  determines  the  roots  of  the  characteristic 
polynomial,  which  are  the  diagonal  entries  of  the  Jordan  form.  Thus  the  sizes 
of  the  primary  blocks  within  the  Jordan  form  are  determined  by  A.  Within  each 
primary  block,  we  need  to  see  that  the  sizes  of  the  various  Jordan  blocks  are 
completely  determined. 

Thus  we  may  assume  that  N  is  nilpotent  and  that  C-1  NC  =  J  is  in  Jordan 
form  with  0’s  on  the  diagonal.  Although  we  shall  make  statements  that  apply 
in  all  cases,  the  reader  may  be  helped  by  referring  to  the  particular  matrix  J  in 
Figure  5.1  and  its  powers  in  Figure  5.2. 
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Each  block  of  the  Jordan  form  J  contributes  1  to  the  dimension  of  the  kernel 
(or  null  space  really)  of  J  via  the  first  column  of  the  block,  and  hence 

dim(ker  J)  =  #{ Jordan  blocks  in  J}. 

In  Figure  5.1  this  number  is  5. 


When  J  is  squared,  the  l’s  in  J  move  up  and  to  the  right  one  more  step  beyond 
the  diagonal  except  that  blocks  of  size  2  become  0.  When  J  is  cubed,  the  1  ’s  in 
J  move  up  and  to  the  right  one  further  step  except  that  blocks  of  size  3  become  0. 
Each  time  J  is  raised  to  a  new  power  one  higher  than  before,  each  block  that 
is  nonzero  in  the  old  power  contributes  an  additional  1  to  the  dimension  of  the 
kernel.  Thus  we  have 

dim(ker  J2)  —  dimtker  J )  =  #{Jordan  blocks  of  size  >  2} 
and  dim(ker  73)  —  dim(ker  J2)  =  #{Jordan  blocks  of  size  >  3}; 
in  the  general  case, 

dimtker  Jk )  —  dim(ker  Jk~l)  =  #{Jordan  blocks  of  size  >  k]  for  k  >  1 . 
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Lemma  5.22  says  that  Jk  =  0  when  k  is  >  the  size  of  J,  and  the  differences  need 
not  be  computed  beyond  that  point. 

For  Figure  5.2  the  values  by  inspection  are  dimfker./2)  =  9  and  dimfker  /3)  = 
1 1;  also  J4  =  0  and  hence  dimfker/4)  =  12.  The  numbers  of  Jordan  blocks 
of  size  >  k  for  k  =  1,  2,  3,  4  are  5,  4.  2,  1,  and  these  numbers  indeed  match  the 
differences  5  —  0,  9  —  5,  11  —  9,  12— 11,  as  predicted  by  the  above  formula. 

Since  C~lNC  =  /,  we  have  C~lNkC  =  Jk  and  NkC  =  CJk.  The  matrix 
C  is  invertible,  and  therefore  dimfker  Jk)  =  dimfker  CJk)  =  dimfker  NkC)  = 
dimfker  Nk).  Hence 

dimfker  Nk)  —  dimfker  IV*-1)  =  #{Jordan  blocks  of  size  >  k)  for  k  >  1, 

and  the  number  of  Jordan  blocks  of  each  size  is  uniquely  determined  by  properties 
of  N .  This  completes  the  proof  of  all  the  uniqueness  statements  in  Theorem  5.20. 

Now  let  us  turn  to  the  proof  of  Theorem  5.21,  first  giving  the  idea.  The 
argument  involves  a  great  many  choices,  and  it  may  be  helpful  to  understand  it  in 
the  context  of  Figures  5.1  and  5.2.  Let  E  =  (e\ , . . . ,  e\i)  be  the  standard  ordered 
basis  of  K.12.  The  matrix  J ,  when  operating  by  multiplication  on  the  left,  moves 
basis  vectors  to  other  basis  vectors  or  to  0.  Namely, 


Je  i  =  0,  Je  2  =  £\,  Je  3  =  C2>  J  £4  =  £3, 

J  £5  =  0,  J  £  6  =  £5,  j  £1  =  £6, 

Je  8=0,  Je  9  =  ^8, 

Je  10  =  0,  Je  11  =  eio, 

J  e\2  =  0, 


with  each  line  describing  what  happens  for  a  single  Jordan  block.  Let  us  think 

of  the  given  nilpotent  matrix  N  as  equal  to  ^  ^  for  some  linear  map  L.  We 

want  to  find  a  new  ordered  basis  T  =  (i>i,...,i>i2)in  which  the  matrix  of  L  is 

/ 


J .  In  the  expression  C  1  /VC  =  J,  the  matrix  C  equals 


ET 


,  and  its  columns 


are  expressions  for  iq , . . . ,  W12  in  the  basis  E,  i.e.,  Ce,  =  v For  each  index  i, 
we  have  J e,  =  J e,-  \  or  Je ,  =  0.  The  formula  NC  =  CJ,  when  applied  to  <?, , 
therefore  says  that 


Nvj  =  NCei  =  CJet 


Cej-\  =  Vj-i  if  Je,  =  e,_|, 
0  if  J et  =  0. 


Thus  we  are  looking  for  an  ordered  basis  such  that  N  sends  each  member  of  the 
basis  either  into  the  previous  member  or  into  0.  The  procedure  in  this  example 
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will  be  to  pick  out  iq  as  a  vector  not  annihilated  by  N 3,  obtain  if.  ih ,  i'i,  from 
it  by  successively  applying  N,  pick  out  if  as  a  vector  not  annihilated  by  N2  and 
independent  of  what  has  been  found,  obtain  iy, ,  v5  from  it  by  successively  applying 
N,  and  so  on.  It  is  necessary  to  check  that  the  appropriate  linear  independence 
can  be  maintained,  and  that  step  will  be  what  the  proof  is  really  about. 

The  proof  of  Theorem  5.21  will  now  be  given  in  the  general  case.  The  core  of 
the  argument  concerns  linear  maps  and  appears  as  three  lemmas.  Afterward  the 
results  of  the  lemmas  will  be  interpreted  in  terms  of  matrices.  For  all  the  lemmas 
let  V  be  an  n -dimensional  vector  space  over  K,  and  let  N  :  V  — »■  V  be  linear 
with  N"  =  0.  Define  Kj  =  ker  N-',  so  that 


0  =  K0  c  Kx  C  K2  c  •  •  •  C  K„  =  V. 

Lemma  5.23.  Suppose  j  >  1  and  suppose  Sj  is  any  vector  subspace  of  V  such 
that  Kj+ 1  =  Kj  ©  Sj.  Then  N  is  one-one  from  Sj  into  Kj  and  N(Sj )  fl  Kj- 1  =  0. 

PROOF.  Since  A  (ker  NJ+l)  C  ker  /V7,  we  obtain  N(Sj )  C  Kj\  thus  N  indeed 
sends  Sj  into  Kj.  To  see  that  N  is  one-one  from  Sj  into  Kj,  suppose  that  s  is  a 
member  of  Sj  with  N (s)  =  0.  Then  s  is  in  K\.  Since  j  >  1,  K\  C  Kj.  Thus  s 
is  in  Kj.  Since  Kj  fl  Sj  =  0,  5  is  0.  Hence  N  is  one-one  from  Sj  into  Kj.  To  see 
that  N(Sj )  fl  Kj- 1  =  0,  suppose  s  is  a  member  of  Sj  with  N(s)  in  Kj-\.  Then 
0  =  N’~l  (N {s))  =  N'(s)  shows  that  s  is  in  Kj.  Since  Kj  fl  Sj  =  0,  s  equals  0. 

□ 

Lemma  5.24.  Define  U„  =  W„  =  0.  For  0  <  j  <  n  -  1,  there  exist  vector 
subspaces  Uj  and  Wj  of  Kj+ 1  such  that 

Kj . ,  =  Kj  ©  Uj  ©  Wj. 

Uj  =  N(Uj+i  ©  Wj+i), 

and  N  :  Uj+ 1  ©  Wj+ 1  — >  Uj  is  one-one. 

Proof.  Define  Un-\  =  N ( U„  ©  W„)  =  0,  and  let  Wn-\  be  a  vector  subspace 
such  that  V  =  K„  =  Kn-\  ©  Wn- \.  Put  Sn- \  =  Un- \  ©  Wn- \.  Proceeding 
inductively  downward,  suppose  that  Un,  Un-\, . .  - ,  Uj+ 1,  W„,  Wn- 1, . . . ,  Wj+ 1 
have  been  defined  so  that  Uk  =  N(Uk+\  ©  Wt+ 1),  N  :  Uk+ 1  ©  W*+ 1  -»■  Ug  is 
one-one,  and  Kg+  \  =  K/(  ©  Uk  ©  Wy-  whenever  k  satisfies  j  <  k  <  n  —  1.  We 
put  Sk  =  Uk  ©  Wk  for  these  values  of  k,  and  then  Sk  satisfies  the  hypothesis  of 
Lemma  5.23  whenever  k  satisfies  j  <  k  <  n  —  1.  We  now  construct  Uj  and  Wj. 
We  put  Uj  =  N(Sj+ 1).  Since  Sj+  \  satisfies  the  hypothesis  of  Lemma  5.23,  we 
see  that  Uj  C  K j+  \ ,  N  is  one-one  from  S/+ 1  into  Uj,  and  Uj  fl  Kj  =  0.  Thus 
we  can  find  a  vector  subspace  Wj  with  Kj+ 1  =  Kj  ©  Uj  ©  Wj,  and  the  inductive 
construction  is  complete.  □ 
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Lemma  5.25.  The  vector  subspaces  of  Lemma  5.24  satisfy 

V  =  U0  ®  Wo  ®  U\  ©  Wi  ®  •  •  •  ®  JJn—\  ®  Wn-i. 

Proof.  Iterated  use  of  Lemma  5.24  gives 

V  =  Kn  =  Kn—\  ©  (C4-1  ©  Wn-t) 

=  K„-2  ©  (Un-2  ©  W,,_2)  ©  (Un- 1  ©  Wn- 1) 

=  •  •  ■  =  Kq  ©  (Uo  ©  Wo)  ©  •  •  •  ©  (Un- 1  ©  W„_i) 

=  (Uo  ©  Wo)  ©  •  •  •  ©  (Un- 1  ©  W, i-t)> 

the  last  step  holding  since  Kq  =  0,  Kq  being  the  kernel  of  the  identity  function. 

□ 


Proof  OF  Theorem  5.21.  We  regard  N  as  acting  on  V  =  If"  by  multiplication 
on  the  left,  and  we  describe  an  ordered  basis  in  which  the  matrix  of  N  is  in  Jordan 
form.  For  0  <  j  <  n  -  1,  form  a  basis  of  the  vector  subspace  W/  of  Lemma 
5.24,  and  let  v(J!  be  a  typical  member  of  this  basis.  Each  vljl  will  be  used  as  the 
last  basis  vector  corresponding  to  a  Jordan  block  of  size  j  +  1.  The  full  ordered 
basis  for  that  Jordan  block  will  therefore  be  N'v(k\  A©-1 v^\  . . . ,  Nv^\ 

The  theorem  will  be  proved  if  we  show  that  the  union  of  these  sets  as  j  and  i/© 
vary  is  a  basis  of  K"  and  that  A©+1  v©  =  0  for  all  j  and  u*© . 

From  the  first  conclusion  of  Lemma  5.24  we  see  for  j  >  0  that  W)  C  Kj+\, 
and  hence  /+1  (W;)  =  0.  Therefore  =  0  for  all  j  and  u(©. 

Let  us  prove  by  induction  downward  on  j  that  a  basis  of  Uj  ©  W,  consists  of  all 
i>L)  and  all  Nkv(i+k)  for  k  >  0.  The  base  case  of  the  induction  is  j  =  n  —  1,  and 
the  statement  holds  in  that  case  since  Un-\  =  0  and  since  the  vectors  1 1  form 
a  basis  of  W„_i.  The  inductive  hypothesis  is  that  all  v^+l}  and  all  /V/;  t/-/+ 1 +/‘  l  for 
k  >  0  together  form  a  basis  of  U j+\  ©  W/+i.  The  second  and  third  conclusions 
of  Lemma  5.24  together  show  that  all  AW(7+1)  and  all  Nk+1v(j+l+k)  for  k  >  0 
together  form  a  basis  of  Uj.  In  other  words,  all  Nk vlJ+k)  with  k  >  0  together 
form  a  basis  of  Uj.  The  vectors  L©  by  construction  form  a  basis  of  W, ,  and 
Uj  fl  Wj  =  0.  Therefore  the  union  of  these  separate  bases  is  a  basis  for  Uj  ©  Wy, 
and  the  induction  is  complete. 

Taking  the  union  of  the  bases  of  Uj  ©  Wy  for  all  j  and  applying  Lemma  5.25, 
we  see  that  we  have  a  basis  of  V  =  K".  This  shows  that  the  desired  set  is  a  basis 
of  K"  and  completes  the  proof  of  Theorem  5.21 .  □ 


238 


V.  Theory  of  a  Single  Linear  Transformation 


7.  Computations  with  Jordan  Form 

Let  us  illustrate  the  computation  of  Jordan  form  and  the  change-of-basis  matrix 
with  a  few  examples.  We  are  given  a  matrix  A  and  we  seek  J  and  C  with 
J  =  C-1  AC.  We  regard  A  as  the  matrix  of  some  linear  L  in  the  standard  ordered 
basis  E,  and  we  regard  J  as  the  matrix  of  L  in  some  other  ordered  basis  F.  Then 

C  =  ^  ^ ,  and  so  the  columns  of  C  give  the  members  of  T  written  as  ordinary 

column  vectors  (in  the  standard  ordered  basis). 


Example  1.  This  example  will  be  a  nilpotent  matrix,  and  we  shall  compute  J 
and  C  merely  by  interpreting  the  proof  of  Theorem  5.21  in  concrete  terms.  Let 


A  = 


1 

1 

1 


The  first  step  is  to  compute  the  characteristic  polynomial,  which  is 

det(X/ -  A)  =  det  ^  I  x-i  o  ^  =  Xdet  ^  )  =  X3. 

Then  A3  =  0  by  the  Cayley-Hamilton  Theorem  (Theorem  5.9),  and  A  is  indeed 
nilpotent.  The  diagonal  entries  of  J  are  thus  all  0,  and  we  have  to  compute  the 
sizes  of  the  various  Jordan  blocks.  To  do  so,  we  compute  the  dimension  of  the 
kernel  of  each  power  of  A.  The  dimension  of  the  kernel  of  a  matrix  equals  the 
number  of  independent  variables  when  we  solve  AX  =  0  by  row  reduction.  With 
the  first  power  of  A,  the  variable  X\  is  dependent,  and  X2  and  xt,  are  independent. 
Also,  A2  =  0.  Thus 

dimfker  A0)  =  0,  dim(kerA)  =  2,  and  dim(ker  A2)  =  3. 


Hence 

#{  Jordan  blocks  of  size  >  1}  =  dim(ker  A)  —  dim(ker  A0  )  =  2  —  0  =  2, 
#{Jordan  blocks  of  size  >  2}  =  dim(ker  A2)  —  dimfker  A)  =  3  —  2  =  1. 

From  these  equalities  we  see  that  one  Jordan  block  has  size  2  and  the  other  has 
size  1.  Thus 

v 


0 
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We  want  to  set  up  vector  subspaces  as  in  Lemma  5.24  so  that  Kj+\  =  Kt  ©  Uj  ©  Wj 
and  Uj  =  A(U/+\  ©  Wj+ 1)  forO  <  j  <2.  Since  K2  =  K2,  the  equations  begin 
with  K2  =  •  •  •  and  are 


K2  =  Ki  ffiOffi  Wu  UQ  =  A(0©  Wi),  Kx  =  K0®U0®W0. 

Here  K2  =  K3  and  K \  is  the  subspace  of  all  X  =  ^*2^  suc^  that  AX  =  0. 

The  space  W\  is  to  satisfy  K2  =  K\  ©  Wi,  and  we  see  that  W\  is  1 -dimensional. 
Let  be  a  basis  of  the  1 -dimensional  vector  subspace  W\.  Then  Uq  is 

1  -dimensional  with  basis  { Av ( 1 ' } .  The  subspace  K 1  is  2-dimensional  and  contains 
Uq.  The  space  Wo  is  to  satisfy  Aii  =  Co©  Wo,  and  we  see  that  Wo  is  1 -dimensional. 
Let  {V0’}  be  a  basis  of  Wo.  Then  the  respective  columns  of  C  may  be  taken  to  be 

Au(1\  vm,  v<®. 


Let  us  compute  these  vectors. 

If  we  extend  a  basis  of  K\  to  a  basis  of  K2,  then  Wi  may  be  taken  to  be  the 

linear  span  of  the  added  vector.  To  obtain  a  basis  of  K  \ ,  we  compute  that  the 

/ 1  —t  0 \ 

reduced  row-echelon  form  of  A  is  I  0  0  0  L  and  the  resulting  system  consists  of 

Vo  00/ 

the  single  equation  x\  —  x2  =  0.  Thus  xi  =  x2l  and 


XI 

X2 


+  x3 


The  coefficients  of  x2  and  x3  on  the  right  side  form  a  basis  of  K  \ ,  and  we  are  to 
choose  a  vector  that  is  not  a  linear  combination  of  these.  Thus  we  can  take  i>(1)  = 


1 

0  1  as 


the  basis  vector  of  Wi.  Then  Uq  =  A(Wi)  has  Au(1)  =  A  (  0  )  = 


as  a  basis,  and  the  basis  of  Wo  may  be  taken  as  any  vector  in  Ad  but  not  Uq.  We 

/°' 

can  take  this  basis  to  consist  of  u(0)  =  I  0 

1  /  -1  1  0 

Lining  up  our  three  basis  vectors  as  the  columns  of  C  gives  us  C  =  I  -too 

V  -1 0 1 

'0-10 


Computation  gives  C  1  = 


-1  0 
-1  1 


,  and  we  readily  check  that  C  1  AC  =  J. 


Example  2.  We  continue  with  A  and  J  as  in  Example  1 ,  but  we  compute  the 
columns  of  C  without  directly  following  the  proof  of  Theorem  5.21.  The  method 
starts  from  the  fact  that  each  Jordan  block  corresponds  to  a  1 -dimensional  space 
of  eigenvectors,  and  then  we  backtrack  to  find  vectors  corresponding  to  the  other 
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columns.  For  this  particular  A,  we  know  that  the  three  columns  of  C  are  to  be  of 
the  form  v\  =  Ai>(1),  m  =  u(1\  and  Vy  =  The  vectors  Vi  and  vy  together 
span  the  0  eigenspace  of  A.  We  find  all  the  0  eigenvectors,  writing  them  as  a 
two-parameter  family.  This  eigenspace  is  just  K\  =  ker  A,  and  we  found  in 

J.  One  of  these  vectors  is  to  be  vi,  and  it  has  to 

/  x~  \ 

equal  Am.  Thus  we  solve  /I in  =  [  X2  )■  Applying  the  solution  procedure  yields 


Example  1  that  K\  = 


-X2 

0 

X3-X2 


This  system  has  no  solutions  unless  xy  —  X2  =  0.  If  we  take  xj  =  X3  =  —  1,  then 
we  obtain  the  same  first  two  columns  of  C  as  in  Example  1,  and  any  vector  in  K\ 

independent  of  n  may  be  taken  as  the  third  column. 


-l 


Example  3.  Let 


Direct  calculation  shows  that  the  characteristic  polynomial  is  det(A7  —  A)  = 
A 3  —  8  A2  +  21 A  —  18  =  (A  —  2) (X  —  3)2.  The  possibilities  for  J  are  therefore 


3  0  0 
0  3  0 
0  0  2 


and 


3  t  o 
0  3  0 
0  0  2 


the  first  one  will  be  correct  if  the  dimension  of  the  eigenspace  for  the  eigenvalue  3 
is  2,  and  the  second  one  will  be  correct  if  that  dimension  is  1 . 

The  third  column  of  C  corresponds  to  an  eigenvector  for  the  eigenvalue  2, 

(O' 

hence  to  a  nonzero  solution  of  (A  —  27) v  =  0.  The  solutions  are  v  =  k  I  o 

(O' 

and  we  can  therefore  use  I  o 

For  the  first  two  columns  of  C,  we  have  to  find  ker(A  —  37)  no  matter  which  of 
the  methods  we  use,  the  one  in  Example  1  or  the  one  in  Example  2.  Solving  the 

system  of  equations,  we  obtain  all  vectors  in  the  space  jz  ^  l  ^  j .  The  dimension 

of  the  space  is  1 ,  and  the  second  possibility  for  the  Jordan  form  is  the  correct  one. 

Following  the  method  of  Example  1  to  find  the  columns  of  C  means  that  we 
pick  a  basis  of  this  kernel  and  extend  it  to  a  basis  of  ker(A  —  3  7 ) 2 .  A  basis  of 
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ker(A  —  31)  consists  of  the  vector  yl  J-  The  matrix  (A  —  3/)2  is 
the  solution  procedure  leads  to  the  formula 
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0 

0  ON 

V 

0 

0  0 

1,  and 

0 

-1  1  / 

/ 

M=aii)+e(i 


for  its  kernel.  The  vector  ^  1 J  arises  from  a  =  1  and  c  =  1 .  We  are  to  make  an 

independent  choice,  say  a  =  1  and  c  =  0.  Then  the  second  basis  vector  to  use  is 
i\ 

o  1 .  This  becomes  the  second  column  of  C,  and  the  first  column  then  has  to  be 

0/  i 

(A  —  37)  ^  -t  The  result  is  that  C  =  ^  -l  o  o 

Following  the  method  of  Example  2  for  this  example  means  that  we  retain  the 

entire  kernel  of  A  —  37,  namely  all  vectors  v\  =  z  ^  t^,  as  candidates  for  the 

first  column  of  C.  The  second  column  is  to  satisfy  (A  —  3 / ) ih  =  V\.  Solving 

leads  tow2  =  z^  o^+c^i^.In  contrast  to  Example  2,  there  is  no  potential 

contradictory  equation.  So  we  choose  z  and  then  c.  If  we  take  z  =  1  and 

c  =  0,  we  find  that  the  first  two  columns  of  C  are  to  be  ^  l  ^  and  ^  o  ^ .  Then 

1  -10s 
C  =  |  l  oo 
.1  0  1, 


For  any  example  in  which  we  can  factor  the  characteristic  polynomial  exactly, 
either  of  the  two  methods  used  above  will  work.  The  first  method  appears 
complicated  but  uses  numbers  throughout;  it  tends  to  be  more  efficient  with 
large  examples  involving  high-degree  minimal  polynomials.  The  second  method 
appears  direct  but  requires  solving  equations  with  symbolic  variables;  it  tends  to 
be  more  efficient  for  relatively  simple  examples. 


8.  Problems 

In  Problems  1-25  all  vector  spaces  are  assumed  finite-dimensional,  and  all  linear 
transformations  are  assumed  defined  from  such  spaces  into  themselves.  Unless 
information  is  given  to  the  contrary,  the  underlying  field  K  is  assumed  arbitrary. 

1.  Let  Mmn( C)  be  the  vector  space  of  m-by-n  complex  matrices.  The  group 
GL(/rz,  C)  x  GL(n,  C)  acts  on  Mmn( C)  by  ((g,  h),  x )  gxh~l ,  where  gxh~l 

denotes  a  matrix  product.  Do  the  following: 
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(a)  Verify  that  this  is  indeed  a  group  action. 

(b)  Prove  that  two  members  of  Mmn( C)  lie  in  the  same  orbit  if  and  only  if  they 
have  the  same  rank. 

(c)  For  each  possible  rank,  give  an  example  of  a  member  of  Mmn  (C)  with  that 
rank. 

2.  Prove  that  a  member  of  Mn  (K)  is  invertible  if  and  only  if  the  constant  term  of  its 
minimal  polynomial  is  different  from  0. 

3.  Suppose  that  L  :  V  — >  V  is  a  linear  map  with  minimal  polynomial  M (X)  — 
P\(X)1'  ■  •  ■  Pk(X)lk  and  that  V  —  U  ©  W  with  U  and  W  both  invariant  under 
L.  Let  P\(X)r'  •  •  ■  Pk(X)rk  and  Pi(Z)*1  •  •  ■  Pii(X)St  be  the  respective  minimal 
polynomials  of  L  |  and  L  | )(/ .  Prove  that  lj  —  max(r;-,  Sj )  for  1  <  j  <  k. 

4.  (a)  If  A  and  B  are  in  M„(K),  if  P(X)  is  a  polynomial  such  that  P(AB)  =  0, 

and  if  Q(X)  —  XP(X),  prove  that  Q{BA)  —  0. 

(b)  What  can  be  inferred  from  (a)  about  the  relationship  between  the  minimal 
polynomials  of  AB  and  of  B  A ? 

5.  (a)  Suppose  that  D  and  D’  are  in  M„(K),  are  similar  to  diagonal  matrices,  and 

have  DD’  —  D'  D.  Prove  that  there  is  a  matrix  C  such  that  C~ 1  DC  and 

C~XD'C  are  both  diagonal. 

(b)  Give  an  example  of  two  nilpotent  matrices  N  and  N'  in  M„  (K)  with  NN'  = 
N'N  such  that  there  is  noC  with  C~ 1  N  C  and  C~XN’C  both  in  Jordan  form. 

6.  (a)  Prove  that  the  matrix  of  a  projection  is  similar  to  a  diagonal  matrix.  What 

are  the  eigenvalues? 

(b)  Give  a  necessary  and  sufficient  condition  for  two  projections  involving  the 
same  V  to  be  given  by  similar  matrices. 

7.  Let  E  :  V  —>  V  and  /•’  :  V7  — >  V  be  projections.  Prove  that  E  and  F  have 

(a)  the  same  image  if  and  only  if  EF  —  F  and  FE  —  E, 

(b)  the  same  kernel  if  and  only  if  EF  —  E  and  FE  —  F . 

8.  Let  E  :  V  — »•  V  and  F  :  V  —>  V  be  projections.  Prove  that  EF  is  a  projection 
if  EF  —  FE.  Prove  or  disprove  a  converse. 

9.  An  involution  on  V  is  a  linear  map  U  :  V  —*■  V  such  that  U2  —  I .  Show 
that  the  equation  U  =  2 E  —  1  establishes  a  one-one  correspondence  between  all 
projections  E  and  all  involutions  U . 

9 A.  Explain  how  the  proof  of  the  converse  half  of  Theorem  5.14  greatly  simplifies 
once  the  Primary  Decomposition  Theorem  (Theorem  5.19)  is  available. 
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10.  Let  L  :  V  —>■  V  be  linear.  Prove  that  there  exist  vector  subspaces  U  and  W  of  V 
such  that 

(i)  V  =  U®W, 

(ii)  L(U)  c  U  and  L(W)  c  W, 

(iii)  L  is  nilpotent  on  U, 

(iv)  L  is  nonsingular  on  W. 


11.  Prove  that  the  vector  subspaces  U  and  W  in  the  previous  problem  are  uniquely 
characterized  by  (i)  through  (iv). 

12.  (Special  case  of  Jordan-Chevalley  decomposition)  Let  L  :  V  — >■  V  be  a 

linear  map,  and  suppose  that  its  minimal  polynomial  is  of  the  form  M(X )  = 
nj=1  (X  —  kj)lj  with  the  Ay  distinct.  Let  V  —  U\®-  ■  -®Uk  be  the  corresponding 
primary  decomposition  of  V ,  and  define  D  :  V  — »•  V  by  D  —  k  \  E\  +  ■  ■  ■  +  /i  ^ , 

where  E\  , are  the  projections  associated  with  the  primary  decomposition. 
Finally  put  N  =  L  —  D.  Prove  that 

(a)  L  =  D  +  N, 

(b)  D  has  a  basis  of  eigenvectors, 

(c)  N  is  nilpotent,  i.e.,  has  Ndlm  v  =  0, 

(d)  DN  =  ND. 

(e)  D  and  N  are  given  by  unique  polynomials  in  L  such  that  each  of  the 
polynomials  is  equal  to  0  or  has  degree  less  than  the  degree  of  M(X), 

(f)  the  minimal  polynomial  of  D  is  n / = i  (X  -  kj), 

(g)  the  minimal  polynomial  of  N  is  XmaxlJ . 

13.  (Special  case  of  Jordan-Chevalley  decomposition,  continued)  In  the  previous 
problem  with  L  given,  prove  that  a  decomposition  L  —  D  +  N  is  uniquely 
determined  by  properties  (a)  through  (d).  Avoid  using  (e)  in  the  argument. 

14.  (a)  Let  N'  be  a  nilpotent  square  matrix  of  size  n! .  Prove  for  arbitrary  c  e  K  that 

the  characteristic  polynomial  of  N’  +  cl  is  ( X  —  c)n  ,  and  deduce  that  the 
only  eigenvalue  of  N'  +  cl  is  c. 

(b )  Let  L  —  D  +  N  be  the  decomposition  in  Problems  1 2  and  1 3  of  a  square  ma¬ 
trix  L  of  size  n.  Prove  that  L  and  D  have  the  same  characteristic  polynomial. 

15.  For  the  complex  matrix  A  =  ^  j,  find  a  Jordan-form  matrix  J  and  an 
invertible  matrix  C  such  that  J  =  C-1  AC . 

(4  l  -i\ 

16.  For  the  complex  matrix  A  =  1-8-2  2  L  find  a  Jordan-form  matrix  J  and  an 

V  8  2  -2 / 

invertible  matrix  C  such  that  J  =  C  ~ 1  A  C . 
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17.  For  the  upper  triangular  matrix 


A  = 


/ 

V 


2  0  0  1  10  0 
2  0  0  0  1  1 
2  0  10  0 
2  0  12 
2  1  1 
2  1 
3 


\ 


/ 


find  a  Jordan-form  matrix  J  and  an  invertible  matrix  C  such  that  J  —  C  1  AC. 


18.  (a)  For  My(C),  prove  that  any  two  matrices  with  the  same  minimal  polynomial 

and  the  same  characteristic  polynomial  must  be  similar. 

(b)  Is  the  same  thing  true  for  M4(C)? 

19.  Suppose  that  K  has  characteristic  0  and  that  /  is  a  Jordan  block  with  nonzero 
eigenvalue  and  with  size  >  1.  Prove  that  there  is  no  n  >  1  such  that  Jn  is 
diagonal. 

20.  Classify  up  to  similarity  all  members  A  of  Mn( C)  with  A"  =  I. 

21.  How  many  similarity  classes  are  there  of  3-by-3  matrices  A  with  entries  in  C 

such  that  A3  =  A?  Explain. 

22.  Let  n  >  2,  and  let  IV  be  a  member  of  Mn{ K)  with  Nn  —  0  but  Nn~l  ^  0.  Prove 
that  there  is  no  n-by-n  matrix  A  with  A2  =  N. 

23.  For  a  Jordan  block  J,  prove  that  J 1  is  similar  to  J . 

24.  Prove  that  if  A  is  in  Mn  (C),  then  A?  is  similar  to  A. 

25.  Let  N  be  the  2-by-2  matrix  ^  and  let  A  and  B  be  the  4-by-4  matrices 

A  =  (  0  ^  and  B  =  (  ^  N  V  Prove  that  A  and  B  are  similar. 

Problems  26-3 1  concern  cyclic  vectors.  Lix  a  linear  map  L  :  V  — »•  V  from  a  finite¬ 
dimensional  vector  space  V  to  itself.  Lor  v  in  V ,  let  V(v)  denote  the  set  of  all  vectors 
Q(L)(v)  in  V  for  Q{X)  in  K[X];  V(v)  is  a  vector  subspace  and  is  invariant  under 
L.  If  U  is  an  invariant  subspace  of  V ,  we  say  that  U  is  a  cyclic  subspace  if  there  is 
some  v  in  U  such  that  V(v)  =  U;  in  this  case,  v  is  said  to  be  a  cyclic  vector  for  U, 
and  U  is  called  the  cyclic  subspace  generated  by  v.  For  v  in  V,  let  T,  be  the  ideal 
of  all  polynomials  Q(X)  in  Y.[X  ]  with  Q(L)v  —  0.  The  monic  generator  of  v  is  the 
unique  monic  polynomial  MV(X)  such  that  MV(X)  divides  every  member  of  Tv. 

26.  For  v  €  V,  explain  why  Iv  is  nonzero  and  why  MV(X)  therefore  exists. 

27.  For  v  e  V,  prove  that 

(a)  the  degree  of  the  monic  generator  Mv  (X)  equals  the  dimension  of  the  cyclic 
subspace  V(v), 

(b)  the  vectors  v ,  L(v),  Lr{ v),  . . .  ,  Ldeg  Mv~I  (v)  form  a  vector-space  basis  of 
V(v), 
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(c)  the  minimal  and  characteristic  polynomials  of  P\V{v)  are  both  equal  to 
MV(X). 

28.  Suppose  that  MV(X )  =  Co  +  c\X  +  ■  ■  ■  +  Cd-\Xd~x  +  Xd .  Prove  that  the  matrix 
of  L  L,  ,  in  a  suitable  ordered  basis  is 

\V(v) 

(~cd- 1  10  -  \ 

I  -Cd- 2  0  1  1 

I  —Cd- 3  0  0 


-Cl  0  0  -  0  10 

I  -ci  0  0  ■■■  0  0  1  I 

\  -co  0  0  ■■■  0  0/ 

29.  Suppose  that  v  is  in  V ,  that  MV(X)  is  a  power  of  a  prime  polynomial  P(X), 
and  that  Q(X)  is  a  nonzero  polynomial  with  deg  Q(X)  <  deg  P(X ).  Prove  that 
HQ(L)(v))  =  V(v). 

30.  Let  P{X)  be  a  prime  polynomial. 

(a)  Prove  by  induction  on  dim  V  that  if  the  minimal  polynomial  of  L  is  P( X), 
then  the  characteristic  polynomial  of  L  is  a  power  of  P(X). 

(b)  Prove  by  induction  on  /  that  if  the  minimal  polynomial  of  L  is  P(X)1 .  then 
the  characteristic  polynomial  of  L  is  a  power  of  P( X). 

(c)  Conclude  that  if  the  minimal  polynomial  of  L  is  a  power  of  P(X),  then 
deg  P(X)  divides  dim  V . 

31.  Prove  that  every  prime  factor  of  the  characteristic  polynomial  of  L  divides  the 
minimal  polynomial  of  L. 


Problems  32 — 40  continue  the  study  of  cyclic  vectors  begun  in  Problems  26-3 1 ,  using 
the  same  notation.  The  goal  is  to  obtain  a  canonical-form  theorem  like  Theorem  5.20 
for  L  but  with  no  assumption  on  K  or  P(X),  namely  that  each  primary  subspace  for 
L  is  the  direct  sum  of  cyclic  subspaces  and  the  resulting  decomposition  is  unique 
up  to  isomorphism.  This  result  and  the  Fundamental  Theorem  of  Finitely  Generated 
Abelian  Groups  (Theorem  4.56)  will  be  seen  in  Chapter  Vlll  to  be  special  cases  of 
a  single  more  general  theorem.  Still  another  canonical  form  for  matrices  and  linear 
maps  is  an  analog  of  the  result  with  elementary  divisors  mentioned  in  the  remarks 
with  Theorem  4.56  and  is  valid  here;  it  is  called  rational  canonical  form,  but  we  shall 
not  pursue  it  until  the  problems  at  the  end  of  Chapter  VIII.  The  proof  in  Problems 
32^10  uses  ideas  similar  to  those  used  for  Theorem  5.21  except  that  the  hypothesis 
will  now  be  that  the  minimal  polynomial  of  L  is  P(X )'  with  P(X)  prime,  rather  than 
just  X1 .  Define  Kj  —  kerf P(L)-i)  for  j  >  0,  so  that  Ko  —  0,  Kj  C  Kj+\  for  all  j, 
K]  —  V ,  and  each  Ks  is  an  invariant  subspace  under  L.  Define  d  =  deg  P(X). 

32.  Suppose  j  >  1,  and  suppose  Sj  is  any  vector  subspace  of  V  such  that  Kj+\  = 
Kj  ®  Sj.  Prove  that  P(L)  is  one-one  from  Sj  into  Kj  and  P(L){Sj)  Cl  Kj- \  —  0. 
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33.  Define  Ui  —  W/  —  0.  For  0  <  j  <1—1,  prove  that  there  exist  vector  subspaces 
Uj  and  Wj  of  Kj+\  such  that 

K  j .  I  -  Kj  (I)  U  j  ©  Wj. 

Uj  =  P(L){Uj+ 1  ®  Wj+l), 

P(L)  :  Uj+i  ®  Wj+ 1  — >•  Uj  is  one-one. 

34.  Prove  that  the  vector  subspaces  of  the  previous  problem  satisfy 

V  =  U0  ®  W0  ®  Ui  ®  Wi  ®  •  •  •  ®  Ui- 1  ®  Wi- 1 . 


35.  For  v  7^  0  in  Wj,  prove  that  the  set  of  all  Lr  P  (L)s  (v)  with  0  <  r  <  d  —  1  and 
0  <  s  <  j  is  a  vector-space  basis  of  V(v). 

36.  Going  back  over  the  construction  in  Problem  33,  prove  that  each  Wj  can  be 
chosen  to  have  a  basis  consisting  of  vectors  Lr( v\'^)  for  1  <  i  <  (dim  Wj)/d 
and  0  <  r  <  d  —  1 . 


37.  Let  the  index  i  used  in  the  previous  problem  with  j  be  denoted  by  ij  for  1  < 
ij  <  (dim  Wj)/d.  Prove  that  a  vector-space  basis  of  Uj  ®  Wj  consists  of  all 
Lr  PiLfiv^)  forO  <  r  <  d  -  1,  k  >  0,  1  <  ij+k  <  (dim  Wj+k)/d. 

38.  Prove  that  V  is  the  direct  sum  of  cyclic  subspaces  under  L.  Prove  specifically 
that  each  generates  a  cyclic  subspace  and  that  the  sum  of  all  these  vector 
subspaces,  with  0  <  j  <  l  and  1  <  b  <  (dim  Wj)/d,  is  a  direct  sum  and 
equals  V. 

39.  In  the  decomposition  of  the  previous  problem,  each  cyclic  subspace  generated 
by  some  vfJ>  has  minimal  polynomial  P(X)  '+l .  Prove  that 


direct  summands  with  minimal  polynomial 
P(X)k  for  some  k  >  j  +  1 


=  (dim/fy+i  —  dim  Kj)/d. 


40.  Prove  that  the  formula  of  the  previous  problem  persists  for  any  decomposition 
of  V  as  the  direct  sum  of  cyclic  subspaces,  and  conclude  from  Problem  28  that 
the  decomposition  into  cyclic  subspaces  is  unique  up  to  isomorphism. 

Problems  4 1  — 46  concern  systems  of  ordinary  differential  equations  with  constant 
coefficients.  The  underlying  field  is  taken  to  be  C,  and  differential  calculus  is  used. 
For  A  in  M„  (C)  and  t  in  R,  define  etA  =  Yl'kLo  '~n~-  Take  for  granted  that  the 
series  defining  etA  converges  entry  by  entry,  that  the  series  may  be  differentiated  term 
by  term  to  yield  Jyle'4)  =  AetA  —  etA  A,  and  that  esA+tB  —  esAetB  if  A  and  B 
commute. 

41.  Calculate  etA  for  A  equal  to 

» (.?;)■ 


8.  Problems 


247 


<b>  (lob 

(c)  the  diagonal  matrix  with  diagonal  entries  d\, ...  ,dn. 

42.  (a)  Calculate  etJ  when  J  is  a  nilpotent  n-by-n  Jordan  block. 

(b)  Use  (a)  to  calculate  etJ  when  7  is  a  general  n-by-n  Jordan  block. 

43.  Let  yi ,  . . . ,  y„  be  unknown  functions  from ffi.  to  C,  and  let  y  be  the  vector- valued 
function  formed  by  arranging  yi ,  . . . ,  yn  in  a  column.  Suppose  that  A  is  in 
M„(C).  Prove  for  each  vector  v  e  C"  that  y(t)  =  e,Av  is  a  solution  of  the 
system  of  differential  equations  ^  —  Ay{t). 

44.  With  notation  as  in  the  previous  problem  and  with  v  fixed  in  C",  use  e~tAy(t) 
to  show,  for  each  open  interval  of  t’s  containing  0,  that  the  only  solution  of 
$  =  Ay(t )  on  that  interval  such  that  y(0)  =  v  is  y(t)  —  elAv. 

45.  For  C  invertible,  prove  that  etC  AC  —  C~1e'AC,  and  deduce  a  relationship 
between  solutions  of  ^  =  Ay(t)  and  solutions  of  ^  =  (C-1  AC)y(t). 

(  210\ 

46.  Let  A  =  I  -14  0  J.  Taking  into  account  Example  3  in  Section  7  and  Problems 

V -l  2  2/ 

42  through  45  above,  find  all  solutions  for  t  in  (—  1 ,  1)  to  the  system  ^  =  Av(t ) 

(\ 

such  that  y(0)  =  I  2 
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Multilinear  Algebra 


Abstract.  This  chapter  studies,  in  the  setting  of  vector  spaces  over  a  field,  the  basics  concerning 
multilinear  functions,  tensor  products,  spaces  of  linear  functions,  and  algebras  related  to  tensor 
products. 

Sections  1-5  concern  special  properties  of  bilinear  forms,  all  vector  spaces  being  assumed  to  be 
finite-dimensional.  Section  1  associates  a  matrix  to  each  bilinear  form  in  the  presence  of  an  ordered 
basis,  and  the  section  shows  the  effect  on  the  matrix  of  changing  the  ordered  basis.  It  then  addresses 
the  extent  to  which  the  notion  of  "orthogonal  complement”  in  the  theory  of  inner-product  spaces 
applies  to  nondegenerate  bilinear  forms.  Sections  2-3  treat  symmetric  and  alternating  bilinear  forms, 
producing  bases  for  which  the  matrix  of  such  a  form  is  particularly  simple.  Section  4  treats  a  related 
subject,  Hermitian  forms  when  the  field  is  the  complex  numbers.  Section  5  discusses  the  groups  that 
leave  some  particular  bilinear  and  Hermitian  forms  invariant. 

Section  6  introduces  the  tensor  product  of  two  vector  spaces,  working  with  it  in  a  way  that  does 
not  depend  on  a  choice  of  basis.  The  tensor  product  has  a  universal  mapping  property— that  bilinear 
functions  on  the  product  of  the  two  vector  spaces  extend  uniquely  to  linear  functions  on  the  tensor 
product.  The  tensor  product  turns  out  to  be  a  vector  space  whose  dual  is  the  vector  space  of  all 
bilinear  forms.  One  particular  application  is  that  tensor  products  provide  a  basis-independent  way 
of  extending  scalars  for  a  vector  space  from  a  field  to  a  larger  field.  The  section  includes  a  number 
of  results  about  the  vector  space  of  linear  mappings  from  one  vector  space  to  another  that  go  hand 
in  hand  with  results  about  tensor  products.  These  have  convenient  formulations  in  the  language  of 
category  theory  as  "natural  isomorphisms.” 

Section  7  begins  with  the  tensor  product  of  three  and  then  n  vector  spaces,  carefully  considering 
the  universal  mapping  property  and  the  question  of  associativity.  The  section  defines  an  algebra 
over  a  field  as  a  vector  space  with  a  bilinear  multiplication,  not  necessarily  associative.  If  £  is  a 
vector  space,  the  tensor  algebra  T(E)  of  £  is  the  direct  sum  over  n  >  0  of  the  n-fold  tensor  product 
of  E  with  itself.  This  is  an  associative  algebra  with  a  universal  mapping  property  relative  to  any 
linear  mapping  of  E  into  an  associative  algebra  A  with  identity:  the  linear  map  extends  to  an  algebra 
homomorphism  of  T  ( E )  into  A  carrying  1  into  1. 

Sections  8-9  define  the  symmetric  and  exterior  algebras  of  a  vector  space  E.  The  symmetric  al¬ 
gebra  S(E)  is  a  quotient  of  T(E)  with  the  following  universal  mapping  property:  any  linear  mapping 
of  E  into  a  commutative  associative  algebra  A  with  identity  extends  to  an  algebra  homomorphism 
of  S(E)  into  A  carrying  1  into  1.  The  symmetric  algebra  is  commutative.  Similarly  the  exterior 
algebra  /\(E)  is  a  quotient  of  T (£)  with  this  universal  mapping  property:  any  linear  mapping  /  of 
E  into  an  associative  algebra  A  with  identity  such  that  /  (v)2  =  0  for  all  i >  e  £  extends  to  an  algebra 
homomorphism  of  /\(E)  into  -A  carrying  1  into  1. 

The  problems  at  the  end  of  the  chapter  introduce  some  other  algebras  that  are  of  importance 
in  applications,  and  the  problems  relate  some  of  these  algebras  to  tensor,  symmetric,  and  exterior 
algebras.  Among  the  objects  studied  are  Lie  algebras,  universal  enveloping  algebras,  Clifford 
algebras,  Weyl  algebras,  Iordan  algebras,  and  the  division  algebra  of  octonions. 
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1.  Bilinear  Forms  and  Matrices 

This  chapter  will  work  with  vector  spaces  over  a  common  field  of  “scalars,”  which 
will  be  called  K.  In  Section  6  a  held  containing  IK  as  a  subheld  will  briehy  play 
a  role,  and  that  will  be  called  L. 

If  V  is  a  vector  space  over  K,  a  bilinear  form  on  V  is  a  function  from  V  x  V 
into  IK  that  is  linear  in  each  variable  when  the  other  variable  is  held  hxed. 

Examples. 

( 1 )  For  general  K,  take  V  =  K" .  Any  matrix  A  in  Mn  (K)  determines  a  bilinear 
form  by  the  rule  (v,  w)  =  v’  Aw. 

(2)  For  K  =  R,  let  V  be  an  inner-product  space,  in  the  sense  of  Chapter  III, 
with  inner  product  ( • ,  • ).  Then  ( • ,  • )  is  a  bilinear  form  on  V . 

Multilinear  functionals  on  a  vector  space  of  row  vectors,  also  called  ^-linear 
functionals  or  ^-multilinear  functionals,  were  dehned  in  the  course  of  working 
with  determinants  in  Section  II. 7,  and  that  dehnition  transparently  extends  to 
general  vector  spaces.  A  bilinear  form  on  a  general  vector  space  is  then  just  a 
2-linear  functional.  From  the  point  of  view  of  dehnitions,  the  words  "functional” 
and  “form"  are  interchangeable  here,  but  the  word  “form”  is  more  common  in 
the  bilinear  case  because  of  a  certain  homogeneity  that  it  suggests  and  that  comes 
closer  to  the  surface  in  Corollary  6.12  and  in  Section  7. 

For  the  remainder  of  this  section,  all  vector  spaces  will  be  finite-dimensional. 

Bilinear  forms,  i.e.,  2-linear  functionals,  are  of  special  interest  relative  to  k- 
linear  functionals  for  general  k  because  of  their  relationships  with  matrices  and 
linear  mappings.  To  begin  with,  each  bilinear  form,  in  the  presence  of  an  ordered 
basis,  is  given  by  a  matrix.  In  more  detail  let  V  be  a  finite-dimensional  vector 
space,  and  let  (• ,  •)  be  a  bilinear  form  on  V.  If  an  ordered  basis  T  =  (v\, . . . ,  vn) 
of  V  is  specified,  then  the  bilinear  form  determines  the  matrix  B  with  entries 
Bjj  =  { V; ,  V/ ) .  Conversely  we  can  recover  the  bilinear  form  from  B  as  follows: 
Write  v  =  J2i  ai vi  and  w  =  bjVj.  Then 


(v,  w)  =  ( aivi>  bivj)  =  Hi.j  Vj)bj. 


In  other  words,  (v,  w)  =  a‘  Bb.  where  a  = 
of  Section  II.  3.  Therefore 


in  the  notation 


B 


( v ,  w)  = 


V 

r 


t 


w 

r 
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Consequently  we  see  that  all  bilinear  forms  on  a  finite-dimensional  vector  space 
reduce  to  Example  1  above— once  we  choose  an  ordered  basis. 

Let  us  examine  the  effect  of  a  change  of  ordered  basis.  Suppose  that  F  = 
(vi, ,  vm)  and  A  =  (w\ , . . . ,  wn),  and  let  B  and  C  be  the  matrices  of  the 
bilinear  form  in  these  two  ordered  bases:  B,/  =  { Vj ,  v;)  and  C;/-  =  (u;,,  Wj).  Let 

the  two  bases  be  related  by  Wj  =  aij vi  -  i-e->  'ct  I  Gy  I  =  ^  Then  we 

have 


Cjj  =  {Wj,  Wj)  =  (J2akiVk,J2aijvi)  =  Hakiaij(Vk,  Vi)  =  Y.akiBkiaij. 

k  l  k,l  k,l 

Translating  this  formula  into  matrix  form,  we  obtain  the  following  proposition. 

Proposition  6.1.  Let  ( • ,  • )  be  a  bilinear  form  on  a  finite-dimensional  vector 
space  V,  let  F  and  A  be  ordered  bases  of  V,  and  let  B  and  C  be  the  respective 
matrices  of  ( • ,  • )  relative  to  T  and  A.  Then 


The  qualitative  conclusion  about  the  matrices  may  be  a  little  unexpected.  It 
is  not  that  they  are  similar  but  that  they  are  related  by  C  =  S'  B  S  for  some 
nonsingular  square  matrix  S.  In  particular,  B  and  C  need  not  have  the  same 
determinant. 

Guided  by  the  circle  of  ideas  around  the  Riesz  Representation  Theorem  for 
inner  products  (Theorem  3.12),  let  us  examine  what  happens  when  we  fix  one 
of  the  variables  of  a  bilinear  form  and  work  with  the  resulting  linear  map.  Thus 
again  let  ( ■ ,  • )  be  a  bilinear  form  on  V.  For  fixed  u  in  V,  v  (u,  v)  is  a  linear 
functional  on  V.  thus  a  member  ofthe  dual  space  V'ofV.  If  we  write  Liu)  for  this 
linear  functional,  then  L  is  a  function  from  V  to  V'  satisfying  L(u)(v)  =  {it,  v). 
The  formula  for  L  shows  that  L  is  in  fact  a  linear  function.  We  define  the  left 
radical,  lrad,  of  ( • ,  • )  to  be  the  kernel  of  L ;  thus 

lrad  (( • ,  • ))  =  {w  e  V  |  (u,  v)  =  0  for  all  v  e  V}. 

Similarly  we  let  R  :  V  — V'  be  the  linear  map  R(v)(u )  =  {it.  v),  and  we  define 
the  right  radical,  rrad,  of  ( • ,  • )  to  be  the  kernel  of  R;  thus 

rrad  ((•,•))  =  {u  e  V  \  (u,  v)  =  0  for  all  u  e  V}. 

Example  1,  continued.  The  vector  space  V  is  the  space  K7'  of  n -dimensional 
column  vectors,  the  dual  V'  is  the  space  of  n -dimensional  row  vectors,  A  is 
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an  n-by-n  matrix  with  entries  in  K,  and  (• ,  •)  is  given  by  {a,  v)  =  u' Av  = 
L(u)( v)  =  R(v)(u)  for  u  and  v  in  K".  Explicit  formulas  for  L  and  R  are 
given  by 

L(u)  =  u‘A  =  ( A'u Y 
and  R(v)  =  (Av)1 . 

Thus 


lrad  (( • ,  • ))  =  kerL  =  null  space(A'), 
rrad  (( • ,  • ))  =  ker  R  =  null  space(A). 

Since  A  is  square  and  since  the  row  rank  and  column  rank  of  A  are  equal,  the 
dimensions  of  the  null  spaces  of  A  and  A'  are  equal.  Hence 

dim  lrad  ({ • ,  •  >)  =  dim  rrad  (( • ,  •  >). 

This  equality  of  dimensions  for  the  case  of  K"  extends  to  general  V,  as  is  noted 
in  the  next  proposition. 

Proposition  6.2,  If  ( • ,  • )  is  any  bilinear  form  on  a  finite-dimensional  vector 
space  V ,  then 

dim  lrad  (( • ,  •  >)  =  dim  rrad  (( • ,  • )). 

PROOF.  We  saw  above  that  computations  with  bilinear  forms  of  V  reduce,  once 
we  choose  an  ordered  basis  for  V ,  to  computations  with  matrices,  row  vectors,  and 
column  vectors.  Thus  the  argument  just  given  in  the  continuation  of  Example  1 
is  completely  general,  and  the  proposition  is  proved.  □ 

A  bilinear  form  ( • ,  • )  is  said  to  be  nondegenerate  if  its  left  radical  is  0.  In 
view  of  the  Proposition  6.2,  it  is  equivalent  to  require  that  the  right  radical  be  0. 
When  the  radicals  are  0,  the  associated  linear  maps  L  and  R  from  V  to  V'  are 
one-one.  Since  dim  V  =  dim  V',  it  follows  that  L  and  R  are  onto  V' .  Thus  a 
nondegenerate  bilinear  form  on  V  sets  up  two  canonical  isomorphisms  of  V  with 
its  dual  V' . 

For  definiteness  let  us  work  with  the  linear  mapping  L  :  V  — >■  V'  given  by 
L(u)(v)  =  ( u ,  v).  If  U  c  V  is  a  vector  subspace,  define 

U±  =  [u  e  V  |  (u,  v)  =  0  for  all  v  e  U}. 

It  is  apparent  from  the  definitions  that 


D  U1'  =  lrad  (( • ,  •  >)|£/X[/ 
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In  contrast  to  the  special  case  that  K  =  M  and  the  bilinear  form  is  an  inner 
product,  U  D  U may  be  nonzero  even  if  ( • ,  • )  is  nondegenerate.  For  example 
let  V  =  K2,  define 

(O0’(y0)  =  Xiyi~X2y2’ 

and  suppose  that  U  is  the  1 -dimensional  vector  subspace  U  =  {(v|)}-  The 

matrix  of  the  bilinear  form  in  the  standard  ordered  basis  is  ^  j ;  since  the  matrix 
is  nonsingular,  the  bilinear  form  is  nondegenerate.  Direct  calculation  shows  that 
U 1  =  {(;,;)}  =  U,  so  that  UHU1  t^O.  Nevertheless,  in  the  nondegenerate  case 

the  dimensions  of  U  and  UL  behave  as  if  U1  were  an  orthogonal  complement. 
The  precise  result  is  as  follows. 

Proposition  6.3.  If  ( • ,  • )  is  a  nondegenerate  bilinear  form  on  the  finite¬ 
dimensional  vector  space  V  and  if  U  is  a  vector  subspace  of  V,  then 

dim  V  =  dim  U  +  dim  Ux. 

Proof.  Define  t  :  V  — >■  U'  by  £(v)(u)  =  ( v ,  u)  for  v  e  V  and  u  e  U.  The 

definition  of  shows  that  kerf  =  U1 .  To  see  that  image t  =  U',  choose  a 

vector  subspace  U\  of  V  with  V  =  U  ©  U\,  let  u  be  in  U' ,  and  dehne  v'  in  V  by 

f  u!  on  U, 

v  =  \ 

l0  on  U\. 

Since  ( • ,  • )  is  nondegenerate,  the  linear  mapping  L  :  V  —>■  V  is  onto  V' .  Thus 
we  can  choose  v  e  V  with  L(u)  =  v'.  Then 

i(v)(u)  =  { v ,  u)  =  L(v)(u)  =  v'(u )  =  u'(u) 

for  all  u  in  U .  and  hence  t(v)  =  u'.  Therefore  image  t  =  U’ ,  and  we  conclude 
that 

dim  V  =  dimfker  f )  +  dimfimage  i)  =  dim  UL  +  dim  U’  =  dim  UL  +  dim  U. 

□ 


Corollary  6.4.  If  ( • ,  •  >  is  a  nondegenerate  bilinear  form  on  the  finite¬ 
dimensional  vector  space  V  and  if  U  is  a  vector  subspace  of  V,  then  V  =  U  ©  (J1 
if  and  only  if  ( • ,  • )  \UxU  is  nondegenerate. 

Proof.  Corollary  2.29  and  Proposition  6.3  together  give 

dim(C  +  Ux)  +  dim(C  fi  Ux)  =  dim  U  +  dim  f/x  =  dim  V. 

Thus  U  +  f/x  =  V  if  and  only  if  U  fi  =  0,  if  and  only  if  ( • ,  • ) \UxU  is 
nondegenerate.  The  result  therefore  follows  from  Proposition  2.30.  □ 
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2.  Symmetric  Bilinear  Forms 

We  continue  with  the  setting  in  which  IK  is  a  field  and  all  vector  spaces  of  interest 
are  defined  over  IK  and  are  finite-dimensional. 

A  bilinear  form  ( • ,  • )  on  V  is  said  to  be  symmetric  if  ( u ,  v)  =  ( v ,  u )  for 
all  u  and  v  in  V ,  skew-symmetric  if  (u,  v )  =  —{v,  u )  for  all  u  and  v  in  V,  and 
alternating  if  (u,  u)  =  0  for  all  u  in  V. 

“Alternating”  always  implies  “skew-symmetric.”  In  fact,  if  ( • ,  • )  is  alternat¬ 
ing,  thenO  =  (u  +  v,  u  +  v)  =  (u,  u)  +  {u,  v)  +  (v,  u)  +  (u,  v)  =  (u,  v)  +  (v,  u ); 
thus  ( • ,  • )  is  skew-symmetric.  If  K  has  characteristic  different  from  2,  then  the 
converse  is  valid:  “skew-symmetric”  implies  “alternating.”  In  fact,  if  ( • ,  • )  is 
skew- symmetric,  then  ( u ,  u)  =  —(u,  u)  and  hence  2 (u,  u >  =  0;  thus  ( u ,  u >  =  0, 
and  ( • ,  • )  is  alternating. 

Let  us  examine  further  the  effect  of  the  characteristic  of  IK.  If,  on  the  one  hand, 
IK  has  characteristic  different  from  2,  the  most  general  bilinear  form  ( • ,  •  >  is  the 
sum  of  the  symmetric  form  ( • ,  -}s  and  the  alternating  form  { • ,  •  )a  given  by 

(u,  v)s  =  \({u,  V )  +  (v,  M>), 

(W,  V)a  =  \{[U,  V )  ~  (V,  U)). 

In  this  sense  the  symmetric  and  alternating  bilinear  forms  are  the  extreme  cases 
among  all  bilinear  forms,  and  we  shall  study  the  two  cases  separately. 

If,  on  the  other  hand,  K  has  characteristic  2,  then  “alternating”  implies  “skew- 
symmetric”  but  not  conversely.  “Alternating”  is  a  serious  restriction,  and  we 
shall  be  able  to  deal  with  it.  However,  "symmetric”  and  "skew-symmetric”  are 
equivalent  since  1  =  —  1,  and  thus  neither  condition  is  much  of  a  restriction;  we 
shall  not  attempt  to  say  anything  insightful  in  these  cases. 

In  this  section  we  study  symmetric  bilinear  forms,  obtaining  results  when  IK 
has  characteristic  different  from  2.  From  the  symmetry  it  is  apparent  that  the 
left  and  right  radicals  of  a  symmetric  bilinear  form  are  the  same,  and  we  call 
this  vector  subspace  the  radical  of  the  form.  By  way  of  an  example,  here  is  a 
continuation  of  Example  1  from  the  previous  section. 

Example.  Let  V  =  K",  let  A  be  a  symmetric  n-by-n  matrix  (i.e.,  one  with 
A ‘  =  A),  and  let  (u,v)  =  u*  Av.  The  computation  (v,  u)  =  v'  Au  =  (v’Au)1  = 
u‘  A'v  =  //'  A  v  =  ( u ,  v)  shows  that  the  bilinear  form  ( • ,  • )  is  symmetric;  the 
second  equality  v‘  Au  =  (uthAn)r  holds  since  vr  Au  is  a  1-by-l  matrix. 

Again  the  example  is  completely  general.  In  fact,  if  T  =  (iq, . . . ,  v„)  is  an 
ordered  basis  of  a  vector  space  V  and  if  ( • ,  • )  is  a  given  symmetric  bilinear  form 
on  V,  then  the  matrix  of  the  form  has  entries  A,-j  =  (v,-,  u; ) ,  and  these  evidently 
satisfy  A,j  =  A,,-.  So  A  is  a  symmetric  matrix,  and  computations  with  the  bilinear 
form  are  reduced  to  those  used  in  the  example. 
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Theorem  6.5  (Principal  Axis  Theorem).  Suppose  that  K  has  characteristic 
different  from  2. 

(a)  If  ( • ,  • )  is  a  symmetric  bilinear  form  on  a  finite-dimensional  vector  space 
V ,  then  there  exists  an  ordered  basis  of  V  in  which  the  matrix  of  ( • ,  • )  is  diagonal. 

(b)  If  A  is  an  n-by-n  symmetric  matrix,  then  there  exists  a  nonsingular  n-by-n 
matrix  M  such  that  M'  AM  is  diagonal. 

Remarks.  Because  computations  with  general  symmetric  bilinear  forms 
reduce  to  computations  in  the  special  case  of  a  symmetric  matrix  and  because 
Proposition  6.1  tells  the  effect  of  a  change  of  ordered  basis,  (a)  and  (b)  amount 
to  the  same  result;  nevertheless,  we  give  two  proofs  of  Theorem  6.5— a  proof  via 
matrices  and  a  proof  via  linear  maps.  A  hint  of  the  validity  of  the  theorem  comes 
from  the  case  that  K  =  R.  For  the  held  M  when  the  bilinear  form  is  an  inner 
product,  the  Spectral  Theorem  (Theorem  3.21)  says  that  there  is  an  orthonormal 
basis  of  eigenvectors  and  hence  that  (a)  holds.  When  K  =  M,  the  same  theorem 
says  that  there  exists  an  orthogonal  matrix  M  with  M~l  AM  diagonal;  since  any 
orthogonal  matrix  M  satisfies  M~x  =  Ml,  the  Spectral  Theorem  is  saying  that 
(b)  holds. 

Proof  via  MATRICES.  If  A  is  an  n-by-n  symmetric  matrix,  we  seek  a  non¬ 
singular  M  with  M1  AM  diagonal.  We  induct  on  the  size  of  A,  the  base  case  of 
the  induction  being  n  =  1 ,  where  there  is  nothing  to  prove.  Assume  the  result  to 
be  known  for  size  n  —  1,  and  write  the  given  n-by-n  matrix  A  in  block  form  as 
A  =  (y  b\  with  d  of  size  1-by-l.  If  d  7^  0,  let  x  be  the  column  vector  —d~xb. 
Then 

(  I  x  \  (  a  b  \  /  I  0  \  _  (  *  0\ 

VOt/vs'd/VF  t/  —  \  0  </  /  ’ 

and  the  induction  goes  through.  If  d  =  0,  we  argue  in  a  different  way.  We  may 
assume  that  fc  /  0  since  otherwise  the  result  is  immediate  by  induction.  Say 
bj  7^  0  with  1  <  i  <  n  —  1.  Let  y  be  an  (n  —  1) -dimensional  row  vector  with  zth 
entry  a  member  8  of  IK  to  be  specified  and  with  other  entries  0.  Then 

V1'  ’  /  U'  0/  lo  1  /  \*  yay’+b’y’+yb )  \*  82au+2Sbi  )  ’ 

Since  IK  has  characteristic  different  from  2,  2b,  is  not  0;  thus  there  is  some  value 
of  5  for  which  <52n,,-  +  2 8b,  7^  0.  Then  we  are  reduced  to  the  case  d  7^  0,  which 
we  have  already  handled,  and  the  induction  goes  through.  □ 

PROOF  via  linear  maps.  We  may  assume  that  the  given  symmetric  bilinear 
form  is  not  identically  0,  since  otherwise  any  basis  will  do.  Let  the  radical  of 
the  form  be  denoted  by  rad  =  rad  (( • ,  • )).  Choose  a  vector  subspace  S  of  V 
such  that  V  =  rad  ©S,  and  put  [  • ,  •]=(•,  • )  |  Then  [  • ,  •  ]  is  a  symmetric 
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bilinear  form  on  S,  and  it  is  nondegenerate.  In  fact,  \u,  •  ]  =  0  means  (u,  v)  =  0 
for  all  v  e  S',  since  (it,  v)  =0  for  v  in  rad  anyway,  (u,v)  =0  for  all  v  e  V ,  u  is 
in  rad  as  well  as  S,  and  u  =  0. 

Since  ( • ,  • )  is  not  identically  0,  the  subspace  S  is  not  0.  Thus  the  nondegen¬ 
erate  symmetric  bilinear  form  [  • ,  •  ]  on  S  is  not  0.  Since 

[ u ,  u]  =  j([n  +  v,  u  +  u]  —  [u,  u]  —  [u,  u]), 

it  follows  that  [u,  v \  ^  0  for  some  v  in  S.  Put  JJ\  =  Kw.  Then  [  • ,  •  y 
is  nondegenerate,  and  Corollary  6.4  implies  that  S  =  U\  ©  U j1.  Applying  the 
converse  direction  of  the  same  corollary  to  U^~,  we  see  that  [•,  •  j xf/|.  is 

nondegenerate.  Repeating  this  construction  with  U1  and  iterating,  we  obtain 


V  =  rad  ®UX  ®  ■  ■  ■  ®  Uk 

with  (Ui,  Uj )  =0  for  i  ■=/=■  j  and  with  dim  {/,  =  1  for  all  i.  This  completes  the 
proof.  □ 

Theorem  6.5  fails  in  characteristic  2.  Problem  2  at  the  end  of  the  chapter 
illustrates  the  failure. 

Let  us  examine  the  matrix  version  of  Theorem  6.5  more  closely  when  K  is  C  or 
K.  The  theorem  says  that  if  A  is  n-by-n  symmetric,  then  we  can  find  a  nonsingular 
M  with  B  =  Mr  AM  diagonal.  Taking  D  diagonal  and  forming  C  =  Dr  BD, 
we  see  that  we  can  adjust  the  diagonal  entries  of  B  by  arbitrary  nonzero  squares. 
Over  C,  we  can  therefore  arrange  that  C  is  of  the  form  diag(l , . . . ,  1,0,  . . . ,  0). 
The  number  of  1  ’s  equals  the  rank,  and  this  has  to  be  the  same  as  the  rank  of  the 
given  matrix  A.  The  form  is  nondegenerate  if  and  only  if  there  are  no  0’s.  Thus 
we  understand  everything  about  the  diagonal  form. 

Over  M,  matters  are  more  subtle.  We  can  arrange  that  C  is  of  the  form 
diag(±l, . . . ,  ±1, 0,  . . . ,  0),  the  various  signs  ostensibly  not  being  correlated. 
Replacing  C  by  P'CP  with  P  a  permutation  matrix,  we  may  assume  that  our 
diagonal  matrix  is  of  the  form  diag(+l,  . . . ,  +1.  — 1, . . . ,  — 1, 0, . . . ,  0).  The 
number  of  +l’s  and  —  l’s  together  is  again  the  rank  of  A,  and  the  form  is 
nondegenerate  if  and  only  if  there  are  no  0’s.  But  what  about  the  separate  numbers 
of  + 1  ’s  and  —  1  ’s?  The  triple  given  by 

ip,  m,  z)  =  (#(+l)’s,  #(— l)’s,  #(0)’s) 

is  called  the  signature  of  A  when  K  =  R.  A  similar  notion  can  be  defined  in  the 
case  of  a  symmetric  bilinear  form  over  R. 

Theorem  6.6  (Sylvester’s  Law).  The  signature  of  an  tt-by-n  symmetric  matrix 
over  R  is  well  defined. 
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PROOF.  The  integer  p  +  m  is  the  rank,  which  does  not  change  under  a  trans¬ 
formation  A  m>-  M*  AM  if  M  is  nonsingular.  Thus  we  may  take  z  as  known.  Let 
ip',  m ' ,  ")  and  ( p.  m .  z)  be  two  signatures  for  a  symmetric  matrix  A,  with  p'  <  p. 
Define  the  corresponding  symmetric  bilinear  form  on  R”  by  (a.  v )  =  u1  Av.  Let 
(i/j , . . . ,  v’n )  and  ( v  i , . . . ,  v„ )  be  ordered  bases  of  R"  diagonalizing  the  bilinear 
form  and  exhibiting  the  resulting  signature,  i.e.,  having  { v' .  v'A  =  (i»/ ,  vj )  =  0 
for  i  j  and  having 
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We  shall  prove  that  {iq, . . . ,  vp,  u',+1, . . . ,  v'n }  is  linearly  independent,  and  then 
we  must  have  p'  >  p.  Reversing  the  roles  of  p  and  //,  we  see  that  p'  =  p  and 
in’  =  m .  and  the  theorem  is  proved.  Thus  suppose  we  have  a  linear  dependence: 

a\V\  +  •  •  •  +  cipVp  =  bp'+ 1  Vp/+1  +  •  •  •  +  b„v'n. 

Let  v  be  the  common  value  of  the  two  sides  of  this  equation.  Then 


and 


p 

(v,  v)  =  (aiui  -I - b  CipVp,  a\V\  -\ - b  apVp)  =  ^a;2  >  0 

j=  i 


(w,  v)  —  { bp’+iv'p,+l  +  •  •  •  +  bnv'n,  bP'+ \v'p,+x  +  •  •  •  +  bnv'n ) 


-  E  b) 


J=p’+  t 


We  conclude  that  ( v ,  v)  =  0,  oj  =  0,  and  a\  =  •  •  •  =  ap  =  0.  Thus  v  =  0 
and  bp'+\  v'p,+]  +  •  •  •  +  bnv'n  =  0.  Since  Wp,+V  . . . ,  v'n }  is  linearly  independent, 
we  obtain  also  iy+ 1  =  •  •  •  =  bn  =  0.  Therefore  {iq, . . . ,  vp,  v'p,+v  . . . ,  v'n)  is  a 
linearly  independent  set,  and  the  proof  is  complete.  □ 


3.  Alternating  Bilinear  Forms 

We  continue  with  the  setting  in  which  IK  is  a  field  and  all  vector  spaces  of  interest 
are  defined  over  IK  and  are  finite-dimensional. 
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In  this  section  we  study  alternating  bilinear  forms,  imposing  no  restriction  on 
the  characteristic  of  K.  From  the  skew  symmetry  of  any  alternating  bilinear  form 
it  is  apparent  that  the  left  and  right  radicals  of  such  a  form  are  the  same,  and  we 
call  this  vector  subspace  the  radical  of  the  form.  First  let  us  consider  examples 
given  in  terms  of  matrices.  Temporarily  let  us  separate  matters  according  to  the 
characteristic. 

Example  1  of  Section  1  with  K  of  characteristic  /  2.  Let  V  = 
K",  let  A  be  a  skew-symmetric  n-by-n  matrix  (i.e.,  one  with  A'  =  —A),  and 
let  (u,v)  =  u' Av.  The  computation  (v,u)  =  v‘ An  =  ( v' Au)‘  =  u’A'v  = 
—urAv  =  —(u,  v)  shows  that  the  bilinear  form  ( • ,  • )  is  skew-symmetric,  hence 
alternating. 

Example  1  of  Section  1  with  K  of  characteristic  =  2.  Let  V  =K",  let 
A  be  an  n-by-n  matrix,  and  define  ( u ,  v)  =  u*  Av.  We  suppose  that  A  is  skew- 
symmetric;  it  is  the  same  to  assume  that  A  is  symmetric  since  the  characteristic 
is  2.  In  order  to  have  (a .  cy )  =0  for  each  standard  basis  vector,  we  shall 
assume  that  A„  =  0  for  all  i.  If  u  is  a  column  vector  with  entries  ii\ ,  . . . ,  un,  then 
(u,  u)  =  it'  An  =  UiAjjUj  =  J2i^j  UiAijUj  =  J2i<j  (AjjUiUj  +  AjiUjUj)  = 
JT  ■  2 AjjUjUj  =  0.  Hence  the  bilinear  form  ( • ,  • )  is  alternating. 

Again  the  examples  are  completely  general.  In  fact,  if  T  =  (ui, . . . ,  vn)  is 
an  ordered  basis  of  a  vector  space  V  and  if  ( • ,  • )  is  a  given  alternating  bilinear 
form,  then  the  matrix  of  the  form  has  entries  A,-,-  =  (u,- ,  Vj)  that  evidently  satisfy 
Ajj  =  —Ajj  and  A„  =0.  So  A  is  a  skew-symmetric  matrix  with  0’s  on  the 
diagonal,  and  computations  with  the  bilinear  form  are  reduced  to  those  used  in 
the  examples.  To  keep  the  terminology  parallel,  let  us  say  that  a  square  matrix  is 
alternating  if  it  is  skew-symmetric  and  has  0’s  on  the  diagonal. 

Theorem  6.7. 

(a)  If  ( • ,  • )  is  an  alternating  bilinear  form  on  a  finite-dimensional  vector  space 
V,  then  there  exists  an  ordered  basis  of  V  in  which  the  matrix  of  ( • ,  • )  has  the 


o 
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If  { • ,  • )  is  nondegenerate,  then  dim  V  is  even. 

(b)  If  A  is  an  n-by-n  alternating  matrix,  then  there  exists  a  nonsingular  n-by-n 
matrix  M  such  that  M'  AM  is  as  in  (a). 

PROOF.  It  is  enough  to  prove  (a).  Let  rad  be  the  radical  of  the  given  form  ( • ,  • ) , 
and  choose  a  vector  subspace  S  of  V  with  V  =  rad  ©.S'.  The  restriction  of  ( • ,  • ) 
to  S  is  then  alternating  and  nondegenerate.  We  may  now  proceed  by  induction 
on  dim  V  under  the  assumption  that  ( • ,  • )  is  nondegenerate.  For  dim  V  =  1,  the 
form  is  degenerate.  For  dim  V  =  2,  we  can  find  u  and  v  with  (u,  v)  i=-  0,  and  we 
can  normalize  one  of  the  vectors  to  make  (u,  v)  =  1.  Then  (m,  v)  is  the  required 
ordered  basis. 

Assuming  the  result  in  the  nondegenerate  case  for  dimension  <  n,  suppose  that 
dim  V  =  n.  Again  choose  u  and  v  with  {u,  v)  =  1,  and  define  U  =  K u  ©  Kv. 
Then  ( • ,  -)\UxU  has  matrix  ^  ’’  ('.  j  and  is  nondegenerate.  By  Corollary  6.4, 

V  =  U  ©  U x ,  and  an  application  of  the  converse  of  the  corollary  shows  that 
( • ,  • )  |  ±  ±  is  nondegenerate.  The  induction  hypothesis  applies  to  f/1,  and  we 

obtain  the  desired  matrix  for  the  given  form.  □ 


4.  Hermitian  Forms 

In  this  section  the  field  will  be  C,  and  V  will  be  a  finite-dimensional  vector  space 
over  C. 

A  sesquilinear  form  ( • ,  • )  on  V  is  a  function  from  V  xV  into  C  that  is  linear 
in  the  first  variable  and  conjugate  linear  in  the  second.1  Sesquilinear  forms  do 
not  make  sense  for  general  fields  because  of  the  absence  of  a  universal  analog  of 
complex  conjugation,  and  we  shall  consequently  work  only  with  the  field  C  in 
this  section.2 

A  sesquilinear  form  ( • ,  • )  is  Hermitian  if  (u,  v)  =  {v,u)  for  all  u  and  v  in 
V.  The  form  is  skew-Hermitian  if  instead  (u.  v )  =  —{v,  u)  for  all  u  and  v  in 
V.  Hermitian  and  skew-Hermitian  forms  are  the  extreme  types  of  sesquilinear 
forms  since  any  sesquilinear  form  ( • ,  • )  is  the  sum  of  a  Hermitian  form  ( • ,  •  )h 
and  a  skew-Hermitian  form  ( • ,  •  )Sh  given  by 

(«,  v)h  =  v )  +  ( v ,  u )), 

(«.  v)sh  =  5«w,  v)  -  (v,  u)). 

'Some  authors,  particularly  in  mathematical  physics,  reverse  the  roles  of  the  two  variables  and 
assume  the  conjugate  linearity  in  the  first  variable  instead  of  the  second. 

2  Sesquilinear  forms  make  sense  in  number  fields  like  Q[>/2]  that  have  an  automorphism  of 
order  2  (see  Section  IV.  1),  but  sesquilinear  forms  in  this  kind  of  setting  will  not  concern  us  here. 
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In  addition,  any  skew-Hermitian  form  becomes  a  Hermitian  form  simply  by 
multiplying  by  i.  Specifically  if  ( • ,  •  )Sh  is  skew-Hermitian,  then  i  ( • ,  •  )sh  is 
sesquilinear  and  Hermitian,  as  is  readily  checked.  Consequently  the  study  of 
skew-Hermitian  forms  immediately  reduces  to  the  study  of  Hermitian  forms. 

Example.  Let  V  =  C",  and  let  A  be  a  Hermitian  matrix,  i.e.,  one  with 
A*  =  A,  where  A*  is  the  conjugate  transpose  of  A.  Then  it  is  a  simple  matter  to 
check  that  (u,  v)  =  v*  Au  defines  a  Hermitian  form  on  C". 

Again  the  example  with  a  matrix  is  completely  general.  In  fact,  let  ( • ,  •  >  be  a 
Hermitian  form  on  VC  let  T  =  (iq, . . . ,  v„)  be  an  ordered  basis  of  V,  and  define 
Ajj  =  { Vi ,  Vj).  Then  A  is  a  Hermitian  matrix,  and  («,  v)  =  u'Av,  where  v  is  the 
entry-by-entry  complex  conjugate  of  v. 

If  A  =  (w\ , . . . ,  wn)  is  a  second  ordered  basis,  then  the  formula  for  changing 
basis  may  be  derived  as  follows:  Write  Wj  =  JT  CjjVj,  so  that  [ c(J  1  is  the  matrix 

j.  If  Bjj  =  ( wi ,  wj),  then  BtJ  =  (uy,  wj)  =  cki(vk,  vi)c/j,  and  hence 


Thus  two  Hermitian  matrices  A  and  B  represent  the  same  Hermitian  form  in 
different  bases  if  and  only  if  B  =  M*  AM  for  some  nonsingular  matrix  M. 

Proposition  6.8. 

(a)  If  ( • ,  • )  is  a  Hermitian  form  on  a  finite-dimensional  vector  space  V  over 
C,  then  there  exists  an  ordered  basis  of  V  in  which  the  matrix  of  ( • ,  •  >  is  diagonal 
with  real  entries. 

(b)  If  A  is  an  n-by-n  Hermitian  matrix,  then  there  exists  a  nonsingular  n-by-n 
matrix  M  such  that  M*  AM  is  diagonal. 

PROOF.  The  above  considerations  show  that  (a)  and  (b)  are  reformulations 
of  the  same  result.  Hence  it  is  enough  to  prove  (b).  By  the  Spectral  Theorem 
(Theorem  3.21),  there  exists  a  unitary  matrix  U  such  that  U~l  AU  is  diagonal 
with  real  entries.  Since  U  is  unitary,  U~l  =  U*.  Thus  we  can  take  M  =  U  to 
prove  (b).  □ 

Just  as  with  symmetric  bilinear  forms  over  M,  we  can  do  a  little  better  than 
Proposition  6.8  indicates.  If  B  is  Hermitian  and  diagonal  with  diagonal  entries 
bi,  and  if  D  is  diagonal  with  positive  entries  d, ,  then  C  =  D  BD  is  diago¬ 
nal  with  diagonal  entries  dfbi.  Choosing  D  suitably  and  then  replacing  C  by 
P'CP  for  a  suitable  permutation  matrix  P ,  we  may  assume  that  P'CP  is  of  the 
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form  diag(+l, . . . ,  +1,  —  1, . . . ,  —  1, 0,  . . . ,  0).  The  number  of  +l’s  and  —  l’s 
together  is  the  rank  of  A,  and  the  form  is  nondegenerate  if  and  only  if  there  are 
no  0’s.  The  triple  given  by 

(. p,m,z )  =  (#(+l)’s,  #(— l)’s,  #(0)’s) 

is  again  called  the  signature  of  A.  A  similar  notion  can  be  defined  in  the  case  of 
a  Hermitian  form,  as  opposed  to  a  Hermitian  matrix. 

Theorem  6.9  (Sylvester's  Law).  The  signature  of  an  n-by-n  Hermitian  matrix 
is  well  defined. 

The  proof  is  the  same  as  for  Theorem  6.6  except  for  adjustments  in  notation. 


5.  Groups  Leaving  a  Bilinear  Form  Invariant 

Although  it  is  not  logically  necessary  to  do  so,  we  digress  in  this  section  to  intro¬ 
duce  some  important  groups  that  are  defined  by  means  of  bilinear  or  Hermitian 
forms.  These  groups  arise  in  many  areas  of  mathematics,  both  pure  and  applied, 
and  their  detailed  structure  constitutes  a  topic  in  the  fields  of  Lie  groups,  algebraic 
groups,  and  finite  groups  that  is  beyond  the  scope  of  this  book.  Thus  the  best 
place  to  define  them  seems  to  be  now. 

We  limit  our  comments  on  applications  to  just  these:  When  the  underlying 
field  in  the  definition  of  these  groups  is  M  or  C,  the  group  is  quite  often  a  “simple 
Lie  group,”  one  of  the  basic  building  blocks  of  the  theory  of  the  continuous  groups 
that  so  often  arise  in  topology,  geometry,  differential  equations,  and  mathematical 
physics.  When  the  underlying  field  is  a  number  field  in  the  sense  of  Example  9 
of  Section  IV.  1 ,  the  group  quite  often  plays  a  role  in  algebraic  number  theory. 
When  the  underlying  field  is  a  finite  field,  the  group  is  often  closely  related  to  a 
finite  simple  group;  an  example  of  this  relationship  occurred  in  Problems  55-62 
at  the  end  of  Chapter  IV,  where  it  was  shown  that  the  group  PSL(2,  K),  built  in 
an  easy  way  from  the  general  linear  group  GL(2,  K),  is  simple  if  the  field  K  has 
more  than  5  elements.  More  general  examples  of  finite  simple  groups  produced 
by  analogous  constructions  are  said  to  be  of  “Lie  type.”  A  celebrated  theorem 
of  the  late  twentieth  century  classified  the  finite  simple  groups— establishing  that 
the  only  such  groups  are  the  cyclic  groups  of  prime  order,  the  alternating  groups 
on  5  or  more  letters,  the  simple  groups  of  Lie  type,  and  26  so-called  sporadic 
simple  groups. 

If  ( • ,  • )  is  a  bilinear  form  on  an  n -dimensional  vector  space  V  over  a  field  K, 
a  nonsingular  linear  map  g  :  V  — >  V  is  said  to  leave  the  bilinear  form  invariant 
if 


(g(u),  g(v))  =  (u,  v ) 
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for  all  u  and  v  in  V .  Fix  an  ordered  basis  F  of  V,  let  A  be  the  matrix  of  the  bilinear 
form  in  this  basis,  let  g'  =  (  it)  be  the  member  of  GL(n,  K)  corresponding 


to  g,  and  abbreviate 


as  w'  for  any  w  in  V.  To  translate  the  invariance 


condition  into  one  concerning  matrices,  we  use  the  formula  (. u ,  v)  =  unAv',  the 
corresponding  formula  for  (g(u),  g ( v ) ) ,  and  the  formula  g(w)'  =  g'(w')  from 
Theorem  2.14.  Then  we  obtain  un grt  Ag'v'  =  u"  A  v'.  Taking  u  to  be  the  z'lh 
member  of  the  ordered  basis  T  and  v  to  be  the  /th  member,  we  obtain  equality  of 
the  (i,  /)th  entry  of  the  two  matrices  g"  Ag'  and  A.  Thus  the  matrix  form  of  the 
invariance  condition  is  that  a  nonsingular  matrix  g'  satisfy 

gnAg'  =  A. 

We  know  that  changing  the  ordered  basis  T  amounts  to  replacing  A  by  M*  AM  for 
some  nonsingular  matrix  M.  If  g'  satisfies  the  invariance  condition  gn  Ag'  =  A 
relative  to  A,  then  M~lg'M  satisfies 

(AT1  g'  M)‘  (Mr  AM)(M~l  g'  M)  =  M'  AM. 

Thus  we  are  led  to  a  conjugate  subgroup  within  GL(n,  K).  A  conjugate  subgroup 
is  not  something  substantially  new,  and  thus  we  might  as  well  make  a  convenient 
choice  of  basis  so  that  A  looks  particularly  special. 

The  interesting  cases  are  that  the  given  bilinear  form  is  symmetric  or  alter¬ 
nating,  hence  that  the  matrix  A  is  symmetric  or  alternating.  Let  us  restrict  our 
attention  to  them.  The  left  and  right  radicals  coincide  in  these  cases,  and  the  first 
thing  to  do  is  to  take  the  two-sided  radical  into  account.  Returning  to  the  original 
bilinear  form,  we  write  V  =  rad  ©5,  where  rad  is  the  radical  and  S  is  some 


vector  subspace  of  S,  and  we  choose  an  ordered  basis  (tq , . . 


Vp+ 1 . h) 


such  that  iq , ,vp  are  in  S  and  vp+\, . . . ,  vn  are  in  rad.  Then  (u,-,  vj)  =0  if 
i  >  p  or  j  >  p,  and  consequently  A  has  its  only  nonzero  entries  in  the  upper 
left  p-by-p  block.  The  same  argument  as  in  the  proofs  of  Theorems  6.5  and 
6.7  shows  that  the  restriction  of  the  bilinear  form  to  S  is  nondegenerate,  and 
consequently  the  upper  left  p- by- p  block  of  A  is  nonsingular.  Changing  notation 

(<^11  #12  i 

) 

o21  *22  / 

with  gn  of  size  p-by-p,  suppose  that  is  another  matrix  written  in  the  same 

block  form,  suppose  that  the  p-by-p  matrix  A  is  nonsingular,  and  suppose  that 
=  (oo)'  a  brief  computation,  we  find  that  necessary  and 

sufficient  conditions  on  g  are  that  g  \  \  be  nonsingular  and  have  g\ ,  1 1  =  A, 

that  g  12  =0,  that  g22  be  arbitrary  nonsingular,  and  that  g2 1  be  arbitrary.  In  other 
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words,  the  only  interesting  condition  g\lAgw  =  A  is  a  reflection  of  what  happens 
in  the  nonsingular  case. 

Consequently  the  interesting  cases  are  that  the  given  bilinear  form  is  non¬ 
degenerate,  as  well  as  either  symmetric  or  alternating.  If  A  is  symmetric  and 
nonsingular,  then  the  group  of  all  nonsingular  matrices  g  such  that  g‘  Ag  =  A  is 
called  the  orthogonal  group  relative  to  A.  If  A  is  alternating  and  nonsingular, 
then  the  group  of  all  nonsingular  matrices  g  such  that  g*  Ag  =  A  is  called  the 
symplectic  group  relative  to  A. 

For  the  symplectic  case  it  is  customary  to  invoke  Theorem  6.7  and  take  A  to 


be 


0  1 

-1  0 

0  1 

-1  0 

V 


o  1 
-1  o 


/ 


except  possibly  for  a  permutation  of  the  rows  and  columns  and  possibly  for 
a  multiplication  by  —  1 .  Two  conflicting  notations  are  in  common  use  for  the 
symplectic  group,  namely  Spin,  K)  and  Spi^n,  K),  and  one  always  has  to  check 
a  particular  author’s  definitions. 

For  the  orthogonal  case  the  notation  is  less  standardized.  Theorem  6.5  says 
that  we  may  take  A  to  be  diagonal  except  when  K  has  characteristic  2.  But  the 
theorem  does  not  tell  us  exactly  which  A’s  are  representative  of  the  same  bilinear 
form.  When  K  =  C,  we  know  that  we  can  take  A  to  be  the  identity  matrix  I. 
The  group  is  known  as  the  complex  orthogonal  group  and  is  denoted  by  O in,  C). 
When  K  =  R,  we  can  take  A  to  be  diagonal  with  diagonal  entries  ±  1 .  Sylvester’s 
Law  (Theorem  6.6)  says  that  the  form  determines  the  number  of  +l’s  and  the 
number  of  —  l’s.  The  groups  are  called  indefinite  orthogonal  groups  and  are 
denoted  by  O ip,  q),  where  p  is  the  number  of  +  l’s  and  q  is  the  number  of  —  l's. 
When  q  =  0,  we  obtain  the  ordinary  orthogonal  group  of  matrices  relative  to  an 
inner  product. 

A  similar  analysis  applies  to  Hermitian  forms.  The  field  is  now  C,  the  invari¬ 
ance  condition  with  the  form  is  still  {g{u),  g(v))  =  (u,  v),  and  the  corresponding 
condition  with  matrices  is  g'  Ag  =  A.  The  interesting  case  is  that  the  Flermitian 
form  is  nondegenerate.  Proposition  6.8  and  Sylvester’s  Law  (Theorem  6.9) 
together  show  that  we  may  take  A  to  be  diagonal  with  diagonal  entries  ±1  and 
that  the  Flermitian  form  determines  the  number  of  + 1  ’s  and  the  number  of  —  1  ’s. 
The  groups  are  the  indefinite  unitary  groups  and  are  denoted  by  U ip,  q),  where 
p  is  the  number  of  +l’s  and  q  is  the  number  of  —  l's.  When  q  =  0,  we  obtain 
the  ordinary  unitary  group  of  matrices  relative  to  an  inner  product. 
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6.  Tensor  Product  of  Two  Vector  Spaces 

If  £  is  a  vector  space  over  K,  then  the  set  of  all  bilinear  forms  on  £  is  a  vector 
space  under  addition  and  scalar  multiplication  of  the  values,  i.e.,  it  is  a  vector 
subspace  of  the  set  of  all  functions  from  Ex  E  into  K.  In  this  section  we  introduce 
a  vector  space  called  the  “tensor  product”  of  £  with  itself,  whose  dual,  even  if  £ 
is  infinite-dimensional,  is  canonically  isomorphic  to  this  vector  space  of  bilinear 
forms. 

Matters  will  be  clearer  if  we  work  initially  with  something  slightly  more  general 
than  bilinear  forms  on  a  single  vector  space  £.  Thus  fix  a  field  K,  and  let  £  and 
£  be  vector  spaces  over  K.  A  function  from  £  x  £  into  a  vector  space  U  over  K 
is  said  to  be  bilinear  if  it  is  linear  in  each  of  the  two  variables  when  the  other  one 
is  held  fixed.  Such  a  space  of  bilinear  functions  is  a  vector  space  over  K  under 
addition  and  scalar  multiplication  of  the  values.  The  bilinear  functions  are  called 
bilinear  forms  when  the  range  space  U  is  K  itself.  More  generally,  if  £ i , ....  £a 
are  vector  spaces  over  K,  a  function  from  E\  x  •  •  •  x  £*  into  a  vector  space  over 
K  is  said  to  be  A' -linear  or  A-multilinear  if  it  is  linear  in  each  of  its  k  variables 
when  the  other  k  —  1  variables  are  held  fixed.  Again  the  word  “form”  is  used  in 
the  scalar- valued  case,  and  all  of  these  spaces  of  multilinear  functions  are  vector 
spaces  over  K. 

In  this  section  we  shall  introduce  the  tensor  product  of  two  vector  spaces  £ 
and  £  over  K,  ultimately  denoting  it  by  £  <g>K  F.  The  dual  of  this  tensor  product 
will  be  canonically  isomorphic  to  the  vector  space  of  bilinear  forms  on  £  x  £. 
More  generally  the  space  of  linear  functions  from  the  tensor  product  into  a  vector 
space  U  will  be  canonically  isomorphic  to  the  vector  space  of  bilinear  functions 
on  £  x  £  with  values  in  U. 

Following  the  habit  encouraged  by  Chapter  IV,  we  want  to  arrange  that  tensor 
product  is  a  functor.  If  V  denotes  the  category  of  vector  spaces  over  IK  and  if 
V  x  V  denotes  the  category  described  in  Section  IV.  11  as  V,s  for  a  two-element 
set  S,  then  tensor  product  is  to  be  a  functor  from  V  x  V  into  V.  Hence  we  will 
want  to  examine  the  effect  of  tensor  products  on  morphisms,  i.e.,  on  linear  maps. 

As  in  similar  constructions  in  Chapter  IV,  the  effect  of  tensor  product  on  linear 
maps  is  captured  by  defining  the  tensor  product  by  means  of  a  universal  mapping 
property.  The  appropriate  universal  mapping  property  rephrases  the  statement 
above  that  the  space  of  linear  functions  from  the  tensor  product  into  any  vector 
space  U  is  canonically  isomorphic  to  the  vector  space  of  bilinear  functions  on 
£  x  £  with  values  in  U. 

If  £  and  £  are  vector  spaces  over  K,  a  tensor  product  of  £  and  £  is  a  pair 
(V,  i)  consisting  of  a  vector  space  V  over  K  together  with  a  bilinear  function 
i  :  £  x  £  — »■  V,  with  the  following  universal  mapping  property:  whenever  bis 
a  bilinear  mapping  of  £  x  £  into  a  vector  space  U  over  K,  then  there  exists  a  unique 
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linear  mapping  B  of  V  into  U  such  that  the  diagram  in  Figure  6. 1  commutes,  i.e., 
such  that  Bl  =  b  holds  in  the  diagram.  When  t  is  understood,  one  frequently 
refers  to  V  itself  as  the  tensor  product.  The  linear  mapping  B  :  V  — >  U  is  called 
the  linear  extension  of  b  to  the  tensor  product. 

E  x  F  — b—+  U 

71 

i  B 

V 

Figure  6.1.  Universal  mapping  property  of  a  tensor  product. 

Theorem  6.10.  If  E  and  F  are  vector  spaces  over  K,  then  a  tensor  product 
of  E  and  F  exists  and  is  unique  up  to  canonical  isomorphism  in  this  sense:  if 
( Vi,  ii)  and  (Vi,  Uj  are  tensor  products,  then  there  exists  a  unique  linear  mapping 
B  :  V 2  —>■  V\  with  Bio  =  i\,  and  B  is  an  isomorphism.  Any  tensor  product  is 
spanned  linearly  by  the  image  of  E  x  F  in  it. 

Remarks.  As  usual,  uniqueness  will  follow  readily  from  the  universal  map¬ 
ping  property.  What  is  really  needed  is  a  proof  of  existence.  This  will  be  carried 
out  by  an  explicit  construction.  Later,  in  Chapter  X,  we  shall  reintroduce  tensor 
products,  taking  the  basic  construction  to  be  that  of  the  tensor  product  of  two 
abelian  groups,  and  then  the  tensor  product  of  two  vector  spaces  will  in  effect 
be  obtained  in  a  slightly  different  way.  However,  the  exact  construction  does  not 
matter,  only  the  existence;  the  uniqueness  allows  us  to  match  the  results  of  any 
two  constructions. 

E  x  F  12  >  V2  E  x  F 

71 

n  /  b2  and  l2 

Vi  v2 

Figure  6.2.  Diagrams  for  uniqueness  of  a  tensor  product. 

Proof  of  UNIQUENESS.  Let  (Vi,  q)  and  (V2,  t2)  be  tensor  products.  Set  up 
the  diagrams  in  Figure  6.2,  and  use  the  universal  mapping  property  to  obtain 
linear  maps  Bo  :  V\  — >  V2  and  B\  :  V2  — >■  V\  extending  t2  and  i  \ .  Then 
B\B2  :  V\  — >  V\  has  B\B2i\  =  B{i2  =  q,  and  lVl  :  V\  —>■  V\  has  (lyjti  =  q. 
By  the  assumed  uniqueness  within  the  universal  mapping  property,  B\  Bo  =  1  v', 
on  Vi.  Similarly  BoB\  =  I  y2  on  V2.  Then  B\  :  V2  — >  Vi  gives  the  canonical 
isomorphism.  Because  of  the  isomorphism  the  image  of  E  x  F  will  span  an 
arbitrary  tensor  product  if  it  spans  some  particular  tensor  product.  □ 
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Proof  of  existence.  Let  V\  =  0(e  K(e,  /),  the  direct  sum  being  taken 
over  all  ordered  pairs  (e,  /)  with  e  €  E  and  /  e  F .  Then  V\  is  a  vector  space 
over  K  with  a  basis  consisting  of  all  ordered  pairs  (e.  /').  We  think  of  all  identities 
that  the  elements  of  V\  must  satisfy  to  be  a  tensor  product,  writing  each  as  some 
expression  set  equal  to  0,  and  then  we  assemble  those  expressions  into  a  vector 
subspace  to  factor  out  from  V\.  Namely,  let  Vo  be  the  vector  subspace  of  Vi 
generated  by  all  elements  of  any  of  the  kinds 

(■ e\  +  e2,  /)  -  (ex,  f)  -  ( e2 ,  /), 

(ce,  f)  —  c(e,  /), 

(e,  f\  +  fi)  -  ( e ,  /i)  -  (e,  f2), 

(e,  cf)  -  c(e,  /), 

the  understanding  being  that  c  is  in  K,  the  elements  e,  e\,  e2  are  in  E,  and  the 
elements  f,  f\,  f2  are  in  F.  Define  V  =  Vi/ Vo,  and  define  i  :  E  x  F  — »■  V \ /  Vo 
by  i  (e.  f)  =  ( e ,  /)  +  Vo-  We  shall  prove  that  (V.  i )  is  a  tensor  product  of  E  and 
E.  The  definitions  show  that  the  image  of  i  spans  V  linearly. 

Let  b  :  E  x  F  — ►  U  be  given  as  in  Figure  6.1.  To  see  that  a  linear  extension 
B  exists  and  is  unique,  define  B\  on  Vi  by 

B\(  E  ci(ei,fi))=  E  Cib(ei,  fi). 

(finite)  (finite) 


The  bilinearity  of  b  shows  that  B\  maps  Vo  to  0.  By  Proposition  2.25,  B\  descends 
to  a  linear  map  B  :  Vi/Vo  — »■  f/,  and  we  have  Bt  =  b.  Hence  B  exists  as  required. 

To  check  uniqueness  of  B ,  we  observe  again  that  the  cosets  (e,  /)  +  Vo  within 
Vi  /  Vo  span  V ;  since  commutativity  of  the  diagram  in  Figure  6. 1  forces 

B((e,  f)  +  Vo)  =  B(i(e,  /))  =  b(e,  /), 

B  is  unique.  This  completes  the  proof.  □ 

A  tensor  product  of  E  and  F  is  denoted  by  ( E  ff  y  F,  t),  with  the  bilinear  map 
t  given  by  t(e,  f)  =  e  <g>  /;  the  map  t  is  frequently  dropped  from  the  notation 
when  there  is  no  chance  of  ambiguity.  The  tensor  product  that  was  constructed 
in  the  proof  of  existence  in  Theorem  6.10  is  not  given  any  special  notation  to 
distinguish  it  from  any  other  tensor  product.  The  elements  e  <g>  /  span  E  <g>]K  F, 
as  was  noted  in  the  statement  of  the  theorem.  Elements  of  the  form  e  <g>  f  are 
sometimes  called  pure  tensors. 

Not  every  element  need  be  a  pure  tensor,  but  every  element  in  E  <g>ic  F  is  a 
finite  sum  of  pure  tensors.  We  shall  see  in  Proposition  6.14  that  if  {«,  }  is  a  basis 
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of  £  and  {vj}  is  a  basis  of  F,  then  the  pure  tensors  n,  <g>  vj  form  a  basis  of  E  F . 
In  particular  the  dimension  of  the  tensor  product  is  the  product  of  the  dimensions 
of  the  factors.  We  could  have  defined  the  tensor  product  in  this  way— by  taking 
bases  and  declaring  that  m,  <g>  Vj  is  to  be  a  basis  of  the  desired  space.  The  difficulty 
is  that  we  would  be  forever  wedded  to  our  choice  of  those  particular  bases,  or 
we  would  constantly  have  to  prove  that  our  definitions  are  independent  of  bases. 
The  definition  by  means  of  Theorem  6.10  avoids  this  difficulty. 

To  make  tensor  product  (£,  £)  E  <g>u  F  into  a  functor,  we  have  to  describe 
the  effect  on  linear  mappings.  To  aid  in  that  discussion,  let  us  reintroduce  some 
notation  first  used  in  Chapter  II:  if  U  and  V  are  vector  spaces  over  K,  then 
HomucCt/,  V )  is  defined  to  be  the  vector  space  of  K  linear  maps  from  U  to  V. 

Corollary  6.11.  If  £,  £,  and  V  are  vector  spaces  over  K,  then  the  vector  space 
Homic(£  Cr  F,  V )  is  canonically  isomorphic  (via  restriction  to  pure  tensors)  to 
the  vector  space  of  all  V -valued  bilinear  functions  on  £  x  F . 

PROOF.  Restriction  is  a  linear  mapping  from  HomucfC  <8>k  F,  V )  to  the  vector 
space  of  all  V- valued  bilinear  functions  on  £  x  £,  and  it  is  one-one  since  the 
image  of  £  x  £  in  £  <g>K  £  spans  £  F.  It  is  onto  since  any  bilinear  function 
from  £  x  £  to  V  has  a  linear  extension  to  £  £,  by  Theorem  6.10.  □ 

Corollary  6.12.  If  £  and  £  are  vector  spaces  over  K,  then  the  vector  space  of 
all  bilinear  forms  on  £  x  £  is  canonically  isomorphic  to  (£  <8>k  F)',  the  dual  of 
the  vector  space  £  <8>k  £. 

PROOF.  This  is  the  special  case  of  Corollary  6.11  in  which  V  =  K.  □ 

Corollary  6.13.  If  £,  £,  and  V  are  vector  spaces  over  K,  then  there  is  a 
canonical  K  linear  isomorphism  <t>  of  left  side  to  right  side  in 

HomK(£  <8>k  £,  V )  =  HomK(£,  HomK(F,  V)) 


such  that 

$(< P)(e)(f)  =  Vie®  f) 
for  all  cp  e  HomK(£  <8>k  £,  V),  e  e  £,  and  /  e  £. 

Remark.  This  result  is  just  a  restatement  of  Corollary  6.11,  but  let  us  prove  it 
anyway,  writing  the  proof  in  the  language  of  the  statement. 

PROOF.  The  map  T>  is  well  defined  and  IK  linear,  and  it  carries  the  left  side  to 
the  right  side.  For  x/i  in  the  right  side,  define  f)  =  Then'hfi/f) 

is  a  bilinear  map  from  £  x  £  into  V,  and  we  let  T  ( i// )  be  the  linear  extension 
from  £  <g>K  £  into  V  given  in  Theorem  6.10.  Then  T  is  a  two-sided  inverse  to 
<f>,  and  the  corollary  follows.  □ 
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Let  us  now  make  (£,  F)  h->  E  F  into  a  covariant  functor.  If  (E\,  F\ )  and 
(£2,  £2)  are  objects  in  V  x  V,  i.e.,  if  they  are  two  ordered  pairs  of  vector  spaces, 
then  a  morphism  from  the  first  to  the  second  is  a  pair  (£ ,  M )  of  linear  maps  of  the 
form  L  :  £j  — >  £2  and  M  :  F\  — »■  F2.  To  ( L .  M),  we  are  to  associate  a  linear 
map  from  E 1  <8>k  F\  into  E2  <8>k  £2  ;  this  linear  map  will  be  denoted  by  L  <g)  M .  We 
use  Corollary  6. 1 1  to  define  L®  M  as  the  member  of  Honm(  £|  E\ .  £281^2) 
that  corresponds  under  restriction  to  the  bilinear  map  (e\,  f\)  t->  L(e  1)  <g>  M (  f\ ) 
of  E 1  x  F]  into  £2  £2-  In  terms  of  pure  tensors,  the  map  £  <g>  M  satisfies 

and  this  formula  completely  determines  L  <g  M  because  of  the  uniqueness  of 
linear  extensions  of  bilinear  maps. 

To  check  that  this  definition  of  the  effect  of  tensor  product  on  pairs  of  linear 
maps  makes  ( E ,  F)  h->  E  ffy  F  into  a  covariant  functor,  we  have  to  check  the 
effect  on  the  identity  map  and  the  effect  on  composition.  For  the  effect  on  the 
identity  map  (l^,,  If,)  when  E\  =  F2  and  F\  =  Fs,  we  see  from  the  above 
displayed  formula  that  (If,  <g>  1f,)(<?i  ®  f\)  =  l£,(ei)  <g>  1f,(/i)  =  e\  <g>  f\  = 
1  F|»XF,  {e\  (g)  /1 ).  Since  elements  of  the  form  e\  (g)  f\  span  E\  <g>K  F\ ,  we  conclude 
that  If,  <8>  If,  =  1f,®ef,- 

For  the  effect  on  composition,  let  :  (E\.  F\)  — >  (£2,  £2)  and 

(£2,  M2)  :  (£2,  £2)  — »■  (£3,  £3)  be  given.  Then  we  have 

(£2  ®  M2)(L  1  ®  M0(ei  ®  /0  =  (£2  ®  M2)(£i(ei)  ®  MiC/O) 

=  (£2£i)(ei)  ®  (MjMOC/O  =  (£2£i  ®  M2Ml)(el  ®  /1). 
Since  elements  of  the  form  ci  <g)  /1  span  £1  ®k  £j  ,  we  conclude  that 
(£2  <g  M2)(L\  ®  Mi)  =  £2£i  <8>  M2M\. 


Therefore  (£,  £)  hf  £  ®jj  £  is  a  covariant  functor. 

In  particular,  £  i->  £  ®k  £  and  F  E  ®jj  £  are  covariant  functors  from  V 
into  itself.  For  these  two  functors  from  V  into  itself,  the  effect  on  linear  mappings 
is  especially  nice,  namely  that 

j  j  . O)  M  {  is  K  linear  from  HomK(£i ,  £2) 

1  159  1  J  intoHomK(£i®KTj,  £2  <8**2), 


Mi  L 1  ®  Mi 


J  is  K  linear  from  Hom-MFi-  £2) 
{  into  HomK(£i  ®k  £1 ,  £2  <8>k  £2)- 


To  prove  the  first  of  these  assertions,  for  example,  we  observe  that  the  sum  of  the 
linear  extensions  of 


(eufi)^  £i(ei)®Mi(/0  and  (gj,  /f)  h>  £j(ei)  ®  Mi(/i) 


268 


VI.  Multilinear  Algebra 


is  a  linear  extension  of  (e\,  f\)  i->  (Li  +  L[)(ei)<g)Mi(/i),andtheumquenessin 
the  universal  mapping  property  implies  that  ( L\+  L\)®  M  \  = 

Similar  remarks  apply  to  multiplication  by  scalars. 

Let  us  mention  some  identities  satisfied  by  There  is  a  canonical  isomor¬ 
phism 

E  <g>K  F  =  F  (8>k  E 


given  by  taking  the  linear  extension  of  (e,  f)  f  ®  e  as  the  map  from  left  to 
right.  The  linear  extension  of  (/,  e)  e  <g)  /  gives  a  two-sided  inverse.  Category 
theory  has  a  way  of  capturing  the  idea  that  this  isomorphism  is  systematic,  rather 
than  randomly  dependent  on  E  and  F .  The  two  sides  of  the  above  isomorphism 
may  be  regarded  as  the  values  of  the  covariant  functors  ( E ,  F)  E  F  and 
{E,  F)  m*-  F  <8>ik  E.  The  notion  in  category  theory  capturing  “systematic”  is 
called  “naturality.”  It  makes  precise  the  fact  that  the  system  of  isomorphisms 
respects  linear  maps,  as  well  as  the  vector  spaces.  Here  is  the  general  definition. 
Its  usefulness  will  be  examined  later  in  this  section. 

Let  C  and  V  be  two  categories,  and  let  :  C  — »■  D  and  :  C  — »■  V 
be  covariant  functors.  Suppose  that  for  each  X  in  Obj(C),  a  morphism  Tx 
in  Morphy,  (<1>(X),  'P(X))  is  given.  Then  the  system  {Tx}  is  called  a  natural 
transformation  of  <3>  into  T  if  for  each  pair  of  objects  X  i  and  X2  in  C  and  each 
h  in  Morphc(Xi,  X2),  the  diagram  in  Figure  6.3  commutes.  If  furthermore  each 
Tx  is  an  isomorphism,  then  it  is  immediate  that  the  system  {Tx1}  is  a  natural 
transformation  of  T  into  and  we  say  that  { Tx }  is  a  natural  isomorphism. 


O(xo  cp(x2) 


^(xo  — ^  vp(x2) 


Figure  6.3.  Commutative  diagram  of  a  natural  transformation  {Tx}- 


If  <f>  and  T  are  contravariant  functors,  then  the  system  \TX  \  is  called  a  natural 
transformation  of  T  into  T  if  the  diagram  obtained  from  Figure  6.3  by  revers¬ 
ing  the  horizontal  arrows  commutes.  The  system  is  a  natural  isomorphism  if 
furthermore  each  Tx  is  an  isomorphism. 

In  the  case  we  are  studying,  we  have  C  =  V  x  V  and  V  =  V.  Objects  X  in  C 
are  pairs  ( E ,  F  )  of  vector  spaces,  and  <t>  and  T  are  the  covariant  functors  with 
<J>(£\  F)  =  E  <g>K  F  and  ^{E,  F)  =  F  <8>k  E.  The  mapping  T(e.f)  '■  E  <8>k  F  -> 
F  <g>K  E  is  uniquely  determined  by  the  condition  that  T(E,F){e  ®f)  =  f®e 
for  all  e  e  E  and  f  e  F.  A  morphism  of  pairs  from  (£j ,  F\ )  to  ( C2,  C2)  is  of 


6.  Tensor  Product  of  Two  Vector  Spaces 


269 


the  form  h  =  (L,  M )  with  L  e  Homv(E  \ ,  Ef)  and  M  e  Horrid (F| .  Fs).  Our 
constructions  above  show  that 

OfL,  M)  =  L  0  M  e  HomR(£i  <S>ic  F\ ,  E2  0k  F2) 
and  ^(L,  M)  =  M  0  L  e  HomK(F!  0K  Eu  F2  0k  E2). 

In  Figure  6.3  the  two  routes  from  top  left  to  bottom  right  in  the  diagram  have 

T(e2,f2)®(L,  M)(e i  <g)  /i)  =  T(e2,f2)(F  (g >  M)(e\  <S>  /i) 

=  T(E2,F2)(L(ei)  ®  M(/0)  =  M(fi)  ®  L(ei) 

and 


d/(L,  M)T(El,Fl)(ei  0  /i)  =  'PfL,  M)(/!  0  gl) 

=  (M  0  L)(/j  0  gl)  =  Af  (/0  0  L(ei). 

The  results  are  equal,  and  therefore  the  diagram  commutes.  Consequently  the 
isomorphism 

is  natural  in  the  pair  (E,  F ). 

Another  canonical  isomorphism  of  interest  is 

E  0k  IK  =  E. 

Here  the  map  from  left  to  right  is  the  linear  extension  of  (e,  c)  i->  ce,  while 
the  map  from  right  to  left  is  e  i->-  e  0  1.  In  view  of  the  previous  canonical 
isomorphism,  we  have  K  0k  E  =  E  also.  Each  of  these  isomorphisms  is  natural 
in  E. 

Next  let  us  consider  how  0k  interacts  with  direct  sums.  The  result  is  that 
tensor  product  distributes  over  direct  sums,  even  infinite  direct  sums: 

F®k{Q)Fs)  =  ®(£  0k  Fs). 

S^S 

The  map  from  left  to  right  is  the  linear  extension  of  the  bilinear  map  ( e ,  ( f\  }.s€s)  i-> 
{e  0  ./v  ises.  For  the  definition  of  the  inverse,  the  constructions  of  Section  II. 6 
show  that  we  have  only  to  define  the  map  on  each  E  0k  Fs,  where  it  is  the  linear 
extension  of  (e,  /,)^e0  {is(fs))}ses',  here  40  :  FSo  ®s  Fs  is  the  one-one 
linear  map  carrying  the  s*  vector  space  into  the  direct  sum.  Once  again  it  is 
possible  to  prove  that  the  isomorphism  is  natural;  we  omit  the  details. 

It  follows  from  the  displayed  isomorphism  and  the  isomorphism  E  0k  IK  =  E 
that  if  {a,  }  is  a  basis  of  E  and  {y;  }  is  a  basis  of  F,  then  {.r,  0  yj }  is  a  basis  of 
E  0k  F.  This  proves  the  following  result. 
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Proposition  6.14.  If  E  and  F  are  vector  spaces  over  K,  then 
dimfl?  <g>K  F)  =  (dim  Effdim  F). 

If  { y>j }  is  a  basis  of  F .  then  the  most  general  member  of  E  ®k  F  is  of  the  form 
Ylj  ej  ®  yj  with  all  ej  in  E. 

We  turn  to  a  consideration  of  Hoiur  from  the  point  of  view  of  functors.  In 
the  examples  in  Section  IV.ll,  we  saw  that  V  i->  Hom-AIA  V)  is  a  covariant 
functor  from  V  to  itself  and  that  U  i->  Horned/.  V)  is  a  contravariant  functor 
from  V  to  itself.  If  we  are  not  squeamish  about  mixing  the  two  types— covariant 
and  contravariant— then  we  can  consider  (U,  V)  i->  Y\omy(U .  V)  as  a  functor3 
fromVxVtoV.  At  any  rate  if  L  is  inHomK(T/i,  Ui)  and  M  is  inHomK(Vj,  V2), 
then  Hom(L,  M)  carries  Hom-zdA.  V\ )  into  Hom-/(h| .  VS)  and  is  given  by 

Hom(L,  M)(h)  =  MhL  for  h  e  Horn K(U2,  Vi). 

It  is  evident  that  the  result  is  K  linear  as  a  function  of  h,  and  hence 

Hom(L,  M)  is  in  Hom*;  (HomK((/2.  Vi),  HomK((7i,  VS))- 

When  we  look  for  analogs  for  the  functor  Homjj  of  the  identity  E  <8>k  K=E 
for  the  functor  ®k,  we  are  led  to  two  identities.  One  is  just  the  definition  of  the 
dual  of  a  vector  space: 

Horn K(U,  K)  =  U'. 

The  other  is  the  natural  isomorphism 

HomK(K,  V)  =  V. 

In  the  proof  of  the  latter  identity,  the  mapping  from  left  to  right  is  given  by  sending 
a  linear  h  :  K  ^  V  to  h  ( 1 ) ,  and  the  mapping  from  right  to  left  is  given  by  sending 
i>  in  V  to  h  with  h(c)  =  cv. 

Next  let  us  consider  how  Horn*  interacts  with  direct  sums  and  direct  products. 
The  construction  HomK((/ ,  V)  distributes  over  finite  direct  sums  in  each  variable, 
but  the  situation  with  infinite  direct  sums  or  direct  products  is  more  subtle.  Valid 
identities  are 

HomK  (©«..  v)=n  Horn K(US,  V) 

seS  seS 

and  HomK  (if-  nK.)=n  HomK(f/,  Vs), 

seS  seS 

3  Readers  who  care  about  this  point  can  regard  U  as  in  the  category  yopp  defined  in  Problems 
78-80  at  the  end  of  Chapter  IV.  Then  (U.  V)  i-+  Homic([/,  V)  is  a  covariant  functor  from  Vopp  x  V 
into  V. 
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and  these  are  natural  isomorphisms.  Proofs  of  these  identities  for  all  S  and 
counterexamples  related  to  them  when  S  is  infinite  appear  in  Problems  7-8  at  the 
end  of  the  chapter. 

We  have  already  checked  that  the  isomorphism  E  F  =  F  E  is  natural  in 
(E,  F ),  and  we  have  asserted  naturality  in  some  other  situations  in  which  it  is  easy 
to  check.  The  next  proposition  asserts  naturality  for  the  identity  of  Corollary  6.13, 
which  combines  ®k  and  HoniK  in  a  nontrivial  way.  After  the  proof  of  the  result, 
we  shall  digress  for  a  moment  to  indicate  the  usefulness  of  natural  isomorphisms. 


Proposition  6.15.  Let  E ,  F,  V,  E  \ ,  F\ ,  and  V\  be  vector  spaces  over  K,  and 
let  Lpx  '■  E\  — >■  E,  LFl  :  F\  — >  F,  and  Ly  :  V  — »■  V\  be  K  linear  maps.  Then 
the  isomorphism  <t>  of  Corollary  6.13  is  natural  in  the  sense  that  the  diagram 


HomK(£  One  F ,  V)  — HomK(£,  HomK(F,  V)) 


HomCL^j  <8>Lf1  ,  Ly) 


Hom(L£j  ,Hom(L/rj  ,Ly )) 


HomK(F!  Fi,  ^  HomigCF!,  HomK(F!,  VO) 


commutes. 

Remarks.  Observe  that  the  first  two  linear  maps  Lpl  and  L /.,  go  in  the 
opposite  direction  to  the  two  vertical  maps,  while  Ly  goes  in  the  same  direction 
as  the  vertical  maps.  This  is  a  reflection  of  the  fact  that  both  sides  of  the  identity 
in  Corollary  6.13  are  contravariant  in  the  first  two  variables  and  covariant  in  the 
third  variable. 

PROOF.  For  tp  in  Hom-<(F  <8>k  F,  V),  e\  in  E\,  and  f\  in  F\ ,  we  have 

(Hom(Lfil,  Hom(Lfl,Lv))  o  4>)(^)(eO(/0 

=  (Horn (LFl,Lv)  o  c t>(cp)  o  L£l)(o)(/i) 

=  (Horn (LFl,Lv)  o  (0>(<p)  o  L£i))(o)(/i) 

=  Lv(<t>((p)(LEl(ei))(LFl(fi))) 

=  Lv(<p(LEl(e1)®LFl(fl))) 

=  (Ly  o  tp  o  (LEi  ®  LFl))(e i  ®  fi) 

=  (Hom(L£l  ®  LFl,  Lv)(tp))(ei  <g>  f\) 

=  4>(Hom(L£l  ®  LFl,  Lv)  oip)(ei)(fi). 


This  proves  the  proposition. 


□ 
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Let  us  now  discuss  naturality  in  a  wider  context.  In  a  general  category  V,  if 
we  have  two  objects  JJ  and  JJ'  such  that  Morphff/,  V )  and  Morph(t/',  V)  have 
the  same  cardinality  for  each  object  V,  then  we  cannot  really  say  anything  about 
the  relationship  between  JJ  and  JJ'.  But  under  a  hypothesis  that  the  isomorphism 
of  sets  has  a  certain  naturality  to  it,  then,  according  to  Proposition  6. 1 6  below, 
JJ  and  JJ'  are  isomorphic  objects.  Thus  naturality  of  a  system  of  weak-looking 
set-theoretic  isomorphisms  can  lead  to  a  much  stronger-looking  isomorphism. 
Corollary  6.17  goes  on  to  make  a  corresponding  assertion  about  functors.  The 
assertion  about  functors  in  the  corollary  is  a  helpful  tool  for  establishing  natural 
isomorphisms  of  functors,  and  an  example  appears  below  in  Proposition  6.20'. 

Proposition  6.16.  Let  V  be  a  category,  and  suppose  that  JJ  and  JJ'  are  objects 
in  V  with  the  following  property:  to  each  object  V  in  D  corresponds  a  one-one 
onto  function 

7V  :  Morph (C,  V)  —  Morphy',  V) 

with  the  system  {Tv}  natural  in  V  in  the  sense  that  whenever  a  is  in  Morph  {V ,  V'), 
then  the  diagram 


Morph (t/,  V)  Tv  >  Morph (L',  V) 


left-by-cr 


left-by-cr 


Morph( JJ,  V')  Tv '  >  Morph (f/',  V ') 


commutes.  Then  JJ  is  isomorphic  to  JJ'  as  an  object  in  D,  an  isomorphism  from 
JJ  to  JJ'  being  the  member  T^}  (lyO  of  Morphft/,  JJ'). 

Remarks. 

( 1 )  Another  way  of  formulating  this  result  is  as  follows :  Let  V  be  any  category, 
let  <Sbe  the  category  of  sets,  and  let  JJ  and  JJ'  be  objects  in  V.  Define  a  covariant 
functor  Hu  :  V  S  by  Hu ( V )  =  Morph V(U,  V )  and  Hu(cr)  =  left-by-cr 
for  a  e  Morphpt  V,  V),  and  define  H[r  similarly.  If  H(j  and  Hu>  are  naturally 
isomorphic  functors,  then  JJ  and  JJ'  are  isomorphic  objects  in  V. 

(2)  A  similar  result  is  valid  when  Hy  and  Hfu  are  contravariant  functors, 
Hu  being  defined  by  Hu ( V )  =  Homp(V,  JJ)  and  HuJa)  =  right-by-cr  for 
a  £  Morphp(V,  V').  The  result  in  this  case  follows  immediately  by  applying 
Proposition  6. 1 6  to  the  opposite  category  T> opp  of  V  as  defined  in  Problems  78-80 
at  the  end  of  Chapter  IV. 

PROOF.  Let  ip  be  the  element  TJ,1  ( I  y-j  of  MorphtL.  JJ'),  and  let  i//  be  the 
element  TuJlu)  of  Morph  (L'',  JJ).  To  prove  the  proposition,  it  is  enough  to  show 
that  (pxfr  =  If/'  and  i jnp  =  If/. 
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For  a  in  Morph  ( V,  V'),  form  the  commutative  diagram  in  the  statement  of  the 
proposition.  The  commutativity  says  that 

aTv(h)  =  Tv- {ah )  for  h  e  MorphfC/,  V).  (*) 

Taking  V  =  U,  V  =  U\  cr  =  <p,  and  h  =  \G  in  (*)  proves  the  second  equality 
of  the  chain 


<Pf  =  <pTu(iu)  =  Tu'{(p\u)  =  Tu'(tp)  =  \v,. 

Taking  V  =  U' ,  V'  =  U,o  =  \jr,  and  h  =  (p  in  (*)  proves  the  first  equality  of 
the  chain 

Tudrip)  =  f Tu’(<p )  =  tA  If /'  =  f  =  Tu(lu); 

Applying  \  we  obtain  tjnp  =  ly,  as  required.  □ 

Corollary  6.17.  Let  C  and  T)  be  categories,  and  let  F  :  C  — >■  V  and 
G  :  C  — >■  D  be  covariant  functors.  Suppose  that  to  each  pair  of  objects  {A,  V)  in 
C  x  D  corresponds  a  one-one  onto  function 

TAy  :  Morph (F (A),  V )  ->  Morph (G (A),  V ) 


with  the  system  {  7a.  v)  natural  in  (A,  V).  Then  the  functors  F  and  G  are  naturally 
isomorphic. 

Remarks.  A  similar  result  is  valid  if  TA,v  carries  Morph (V,  F( A))  to 
Morph(  V,  G(  A))  and/or  if  F  and  G  are  contravariant.  To  handle  these  situations, 
we  apply  the  corollary  to  the  opposite  categories  T> opp  and/or  C  opp,  as  dehned  in 
Problems  78-80  at  the  end  of  Chapter  IV,  instead  of  to  the  categories  D  and/or  C. 

Proof.  By  Proposition  6.16  and  the  hypotheses,  the  member  ^4  g(A)(^g(a)) 
of  Morph^/FfA),  G(A))  is  an  isomorphism.  We  are  to  prove  that  the  system 
{Ta  gia)}  is  natural  in  A.  If  o  in  Morph^fA,  A')  is  given,  then  the  naturality  of 
Ta  (/  in  the  V  variable  implies  that  the  diagram 


MorphP(F(A),  G(A)) 

left-by -G(cr) 

Morph P(F(A),  G(A0) 


Ta.O(A > 


>  Morphj,(G(A),  G(A)) 

left-by- G(tr) 


Ta.G(A') 


>  MorphP(G(A),  G(A')) 


commutes.  Evaluating  at  T A  g(A>(\  G(A))  £  Morphj,(F(A),  G(A))  the  two  equal 
compositions  in  the  diagram,  we  obtain 


G(ff)  =  G(ct)1G(A)  =  Ta,gu1(G(o)T-1G{A)(  1G(A))).  (*) 
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With  a  as  above,  the  naturality  of  TA  y  in  the  A  variable  implies  that  the  diagram 


Morph V(F(A'),  G(A')) 


-^X  Morphv(G(A'),  G(A')) 


right-by- F  (a) 


right-by-G(a) 


Morphp(F(A),  G(A')) 


^X  Morphj,(G(A),  G(A')) 


commutes.  Evaluating  at  7’a,1G(A,)(1g(A'))  e  Morphy (F( A'),  G(A'))  the  two 
equal  compositions  in  the  diagram,  we  obtain 


G(cr)  =  1  g(A')G(ct)  =  TAG(A’)  {TA\G(A’)(l<EA'XF  (<*))■  (**) 

Equations  (*)  and  (**),  together  with  the  fact  that  TA  G(AA  is  invertible,  say  that 

GWa,G(A)<Xg(A))  =  F A'  ,G(A')^-G(A'))F  (ST’)‘ 

In  other  words,  the  isomorphism  TA  e  Morph-D( F(A),  G(A))  given  by  TA  = 
Ta'g{A)^G(A))  makes  the  diagram 


F(A) 

F(o)\ 


->  G(A) 

|G(<7) 


F{A!) 


+  G(A') 


commute.  Thus  F  is  naturally  isomorphic  to  G. 


□ 


Tensor  product  provides  a  device  for  converting  a  real  vector  space  canonically 
into  a  complex  vector  space,  so  that  a  basis  over  M  in  the  original  space  becomes  a 
basis  over  C  in  the  new  space.  If  E  is  the  given  real  vector  space,  then  the  complex 
vector  space,  called  the  complexification  of  E,  is  the  space  E  =  E  0®  C  with 
multiplication  by  a  complex  number  c  in  Ec  dehned  to  be  1  0  (z  m*  cz). 

This  construction  works  more  generally  when  we  have  any  inclusion  of  fields 
K  c  L.  In  this  situation,  L  becomes  a  vector  space  over  K  if  scalar  multiplication 
K  x  L  — ►  L  is  dehned  as  the  restriction  of  the  multiplication  L  x  L  —>■  L  within 
L.  For  any  vector  space  E  over  K,  we  define  £  =  E  L,  initially  as  a  vector 
space  over  K.  For  c  e  L,  we  then  define 

(multiplication  by  c  in  E  L)  =  1  ®  (multiplication  by  c  in  L). 
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The  above  identities  concerning  tensor  products  of  linear  maps  allow  one  easily 
to  prove  the  following  identities: 


ci(c2v)  =  (cic2)v, 
c(n  +  v)  =  cu  +  cv, 

(A  +  c2)v  =  Civ  +  c2  v, 
lr>  =  v. 

Together  these  identities  say  that  EL  =  E  <8>k  L,  with  its  vector-space  addition 
and  the  above  definition  of  multiplication  by  scalars  in  L,  is  a  vector  space  over 
L.  The  further  identity 

c(e  <g>  1)  =  ce  <g>  1  if  c  is  in  K  and  e  is  in  E 

shows  that  its  scalar  multiplication  is  consistent  with  scalar  multiplication  in  E 
when  the  scalars  are  in  K  and  E  is  identified  with  the  subset  E  (f)  I  of  E. 

Let  us  say  that  the  pair  (Eh,  t),  where  i  :  E  —>  E  is  the  mapping  m  eg  I, 
is  obtained  by  extension  of  scalars.  This  construction  is  characterized  by  a 
universal  mapping  property  as  follows. 

Proposition  6.18.  Let  K  C  L  be  an  inclusion  of  fields,  and  let  £  be  a  vector 
space  over  K. 

(a)  If  ( E  .  t)  is  formed  by  extension  of  scalars,  then  (E  \  i)  has  the  following 
universal  mapping  property:  whenever!/  is  a  vector  space  over  L  and  cp  :  E  — »■  U 
is  a  K  linear  map,  there  exists  a  unique  L  linear  map  <t>  :  F  — >  U  such  that 

<$>  i  =  tp. 

(b)  Suppose  that  (V,  j)  is  any  pair  in  which  V  is  a  vector  space  over  L  and 
j  :  E  — »■  V  is  a  K  linear  function  such  that  the  following  universal  mapping 
property  holds:  whenever  U  is  a  vector  space  over  L  and  tp  :  E  — >  U  is  a  E 
linear  map,  there  exists  a  unique  L  linear  map  4>  :  V  — »■  U  such  that  T/  =  <p. 
Then  there  exists  a  unique  isomorphism :  E  V  ofL  vector  spaces  such  that 

* »  =  j- 

PROOF.  In  (a),  for  the  uniqueness  of  d>,  we  must  have  <l>(e(8)c)  =  cd>(e<g>  1)  = 
c ( O  t)(e)  =  ccp(e).  Hence  <t>  is  determined  by  <p  on  pure  tensors  in  E  <8>k  L  ar>d 
therefore  everywhere. 

For  existence  let  :  E  0  ^  L  — >■  U  be  the  IK  linear  extension  of  the  K  bilinear 
function  of  E  x  L  into  U  given  by 


(e,  c )  ctp(e)  for  e  €  E  and  ceL. 
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In  the  L  vector  space  E  L,  multiplication  by  a  member  co  of  L  is  defined  to 
be  1  <g>  (multiplication  by  co).  On  a  pure  tensor  e  <g>  c,  we  therefore  have 

®(c0(e  ®  c))  =  <b(e  <g>  c0c )  =  (c0c)<p(e)  =  c0(c<p(e))  =  c0(< P(e  <g>  c)). 

Since  E  <g>K  L  is  generated  by  pure  tensors,  O  is  L  linear.  By  the  construction  of 
<t>,  (p(e)  =  d>(e  <g>  1)  =  (d>  i)(e).  Thus  <h  has  the  required  properties. 

In  (b),  let  (V,  /')  have  the  same  universal  mapping  property  as  ( E\  i).  We 
apply  the  universal  mapping  property  of  (£  ,  i)  to  the  K  linear  map  j  :  E  —>  V 
to  obtain  an  L  linear  <E>  :  E  — >  V  with  <t>  i  =  j,  and  we  apply  the  universal 
mapping  property  of  ( V,  j )  to  the  K  linear  map  (  :  E  — »■  EL  to  obtain  an  L  linear 
<P'  :  V  — >  Eh  with  O'/  =  i.  From  (O'O)t  =  O'/  =  i  and  1£l  i  =  i,  the 
uniqueness  in  the  universal  mapping  property  for  (Eh,  i)  implies  O'O  =  I  el. 
Arguing  similarly,  we  obtain  OO'  =  \y.  Thus  O  is  an  isomorphism  with  the 
required  properties. 

If  O  :  E'l  — »■  V  is  another  isomorphism  with  O  i  =  j ,  then  the  argument  just 
given  shows  that  O'O  =  1  Fj.  and  OO'  =  \v.  Hence  O  =  (O')-1  =  O,  and  O 
is  unique.  □ 

To  make  E  i->-  Eh  into  a  covariant  functor  from  vector  spaces  over  K  to  vector 
spaces  over  L,  we  must  examine  the  effect  on  linear  maps.  The  tool  is  Proposition 
6.18a.  Thus  let  E  and  F  be  two  vector  spaces  over  K,  and  let  M  :  E  — >■  F  be 
a  K  linear  map  between  them.  We  extend  scalars  for  E  and  F.  The  proposition 
applies  to  the  composition  E  —>■  F  —>■  Fh  and  shows  that  the  composition 
extends  uniquely  to  an  L  linear  map  from  E  to  E  .  A  quick  look  at  the  proof 
shows  that  this  L  linear  map  is  M  <g>  1 .  Actually,  we  can  see  directly  that  M  <g>  1  is 
indeed  linear  over  L  and  not  just  over  K:  we  just  use  our  identity  for  compositions 
of  tensor  products  to  write 

{M  <g>  1) (/  <g>  (multiplication  by  c))  =  M  ®  (multiplication  by  c) 

=  (/  <g>  (multiplication  by  c))(M  <g>  1). 

In  any  event,  the  explicit  form  of  the  extended  linear  map  as  M  <g>  1  shows 
immediately  that  the  identity  linear  map  goes  to  the  identity  and  that  compositions 
go  to  compositions.  Thus  E  Eh  is  a  covariant  functor. 

In  the  special  case  that  the  vector  spaces  are  K"  and  YJ" ,  extension  of  scalars 
has  a  particularly  simple  interpretation.  The  new  spaces  may  be  viewed  as  L" 
and  L'" .  Thus  column  vectors  with  entries  in  K  get  replaced  by  column  vectors 
with  entries  in  L.  What  happens  with  linear  mappings  is  even  more  transparent. 
A  linear  map  M  :  E  — »■  F  is  given  by  an  m-by-n  matrix  A  with  entries  in  K,  and 
the  linear  map  M  <g>  1  :  Eh  —>■  Fh  is  the  one  given  by  the  same  matrix  A.  Now 
the  entries  of  A  are  to  be  regarded  as  members  of  the  larger  field  L.  Viewed  this 
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way,  extension  of  scalars  might  look  as  if  it  is  dependent  on  choices  of  bases,  but 
the  tensor-product  formalism  shows  that  it  is  not. 

A  related  notion  to  extension  of  scalars  is  that  of  restriction  of  scalars.  Again 
with  an  inclusion  K  C  L  of  fields,  a  vector  space  E  over  the  larger  field  L 
becomes  a  vector  space  over  the  smaller  field  IK  by  ignoring  unnecessary 
scalar  multiplications.  Although  this  notion  is  related  to  extension  of  scalars,  it 
is  not  inverse  to  it.  For  example,  if  the  two  fields  are  R  and  C  and  if  we  start  with 
an  n -dimensional  vector  space  E  over  R,  then  E  is  a  complex  vector  space  of 
dimension  n  and  (£’c)r  is  a  real  vector  space  of  dimension  2 n.  We  thus  do  not 
get  back  to  the  original  space  E. 


7.  Tensor  Algebra 

Just  as  polynomial  rings  are  often  used  in  the  construction  of  more  general 
commutative  rings,  so  “tensor  algebras”  are  often  used  in  the  construction  of 
more  general  rings  that  may  not  be  commutative.  In  this  section  we  construct  the 
“tensor  algebra”  of  a  vector  space  as  a  direct  sum  of  iterated  tensor  products  of 
the  vector  space  with  itself,  and  we  establish  its  properties.  We  shall  proceed  with 
care,  in  order  to  provide  a  complete  proof  of  the  associativity  of  the  multiplication. 

Let  A,  B,  and  C  be  vector  spaces  over  a  field  K.  A  triple  tensor  product  V  = 
A  <g>K  B  <g)]K  C  is  a  vector  space  over  IK  with  a  3 -linear  map  (  :  A  x  B  x  C  — »■  V 
having  the  following  universal  mapping  property:  whenever  t  is  a  3 -linear  map¬ 
ping  of  A  x  B  x  C  into  a  vector  space  U  over  K,  then  there  exists  a  linear  mapping 
T  of  V  into  U  such  that  the  diagram  in  Figure  6.4  commutes. 

Ax  B  xC  — U 

n 
T 

V  =  A  <8>k  B  <8>ik  C 


Figure  6.4.  Commutative  diagram  of  a  triple  tensor  product. 

The  usual  argument  with  universal  mapping  properties  shows  that  there  is  at 
most  one  triple  tensor  product  up  to  a  well-determined  isomorphism,  and  one  can 
give  an  explicit  construction  of  it  that  is  similar  to  the  one  for  ordinary  tensor 
products  E  C>k  F .  We  shall  not  need  that  particular  proof  of  existence  since 
Proposition  6. 19a  below  will  give  us  an  alternative  argument.  Once  we  have  that 
statement,  we  shall  use  the  uniqueness  of  triple  tensor  products  to  establish  in 
Proposition  6.19b  an  associativity  formula  for  ordinary  iterated  tensor  products. 
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A  shorter  proof  of  Proposition  6.19b,  which  avoids  Proposition  6.19a  and  uses 
naturality,  will  be  given  after  the  proof  of  Proposition  6.20. 

Proposition  6.19.  If  K  is  a  field  and  A .  B .  C  are  vector  spaces  over  K,  then 

(a)  ( A  <8>k  B)  <8>ik  C  and  A  <8>ik  ( B  <g>K  C)  are  triple  tensor  products. 

(b)  there  exists  a  unique  K  isomorphism  <t>  from  left  to  right  in 

(A  (8>k  B)  C  =  A  <8>k  ( B  <8>k  C) 
such  that  ((o  <g>  b)  <g>  c)  =  a  <g>  (b®c)  for  all  a  e  A,b  e  B,  and  c  e  C. 

Proof.  In  (a),  consider  (A  <g>K  B)  C.  Let  t  :  A  x  B  x  C  — »■  U  be 
3-linear.  For  c  e  C,  define  tc  :  A  x  B  — »■  U  by  tc(a,  b)  =  t(a ,  b ,  c).  Then 
is  bilinear  and  hence  extends  to  a  linear  Tc  :  A  <8>k  B  — >  U.  Since  t  is  3-linear, 
tCl+C2  =  tCl  + 1,2  and  txc  =  xtc  for  scalar  x;  thus  uniqueness  of  the  linear  extension 
forces  TCl+c2  —  Tc  j  +  TCl  and  Txc  —  r  7^.  Consequently 

t'  :  (A  <8>k  B)  x  C  {/ 

given  by  t'(d.  c)  =  ’L  td)  is  bilinear  and  therefore  extends  to  a  linear 
T  :  (A%B)®fC  — »■  I/.  This  7'  proves  existence  of  the  linear  extension  of  the 
given  t .  Uniqueness  is  trivial,  since  the  elements  (a®b)®c  span  ( A  <8>k  B )  <8>k  C . 
So  (A  ®k  B )  <8>k  C  is  a  triple  tensor  product.  In  a  similar  fashion,  A  ®k  (B  ®rC) 
is  a  triple  tensor  product. 

For  (b),  set  up  the  diagram  of  the  universal  mapping  property  for  a  triple  tensor 
product,  using  V  =  (A  <g>K  B)  <8>k  C,  U  =  A  <g>K  (B  <g>x  C),  and  t (a,  b ,  c)  = 
a  <g>  (b  ®  c).  We  have  just  seen  in  (a)  that  V  is  a  triple  tensor  product  with 
t(n,  b,  c)  =  ( a<gib)<8ic .  Thus  there  exists  a  linear  T  :  V  — >■  f/  with  T i  (a,  b,c)  = 
t(a,  b,c).  This  equation  means  that  T ({a <g>b)®c)  =  a®  (b®c).  Interchanging 
the  roles  of  (A  ®k  B )  <8>k  C  and  A  ®u  (B  ®k  C),  we  obtain  a  two-sided  inverse 
for  T.  Thus  T  will  serve  as  T>  in  (b),  and  existence  is  proved.  Uniqueness  is 
trivial,  since  the  elements  ( a  ®b)®c  span  (A  ®k  B)  C.  □ 

When  there  is  no  danger  of  confusion.  Proposition  6.19  allows  us  to  write  a 
triple  tensor  product  without  parentheses  as  A  <g>R  B  <8>k  C.  The  same  argument 
as  in  Corollaries  6.11  and  6.12  shows  that  the  vector  space  of  3-linear  forms  on 
AxBxCis  canonically  isomorphic  to  the  dual  of  the  vector  space  A  B  ®r  C . 

Just  as  with  Corollary  6.13  and  Proposition  6.15,  the  result  of  Proposition  6.19 
can  be  improved  by  saying  that  the  isomorphism  is  natural  in  the  variables  A,  B. 
and  C,  as  follows. 
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Proposition  6.20.  Let  A,  B,  C,  A\,  B\ ,  and  C i  be  vector  spaces  over  a  field 
K,  and  let  La  :  A  — >  A\,  LB  :  B  — »■  fij,  and  Lc  :  C  — >  Cj  be  linear  maps. 
Then  the  isomorphism  d>  of  Proposition  6.19b  is  natural  in  the  triple  (A,  B ,  C) 
in  the  sense  that  the  diagram 

(A  fi)  %  C  — A  ®i  (fi  %  C) 


commutes. 


(Li40Lg)0LcJ^ 

(A\  <8>k  B\)  (g>K  Ci  - > 


J^L^0(Lb0Lc) 
A\  <g>K  (B\  <8>k  Ci) 


Proof.  We  have 

(( La  <g>  (Lb  <g>  Lc))  o  4>)((a  <g>  b)  <g>  c) 

=  (La  ®(Lb®  Lc))(a  ®(b®  c)) 

=  LAa  <g>  ( Lb  <S>  Lc)(b  <g>  c) 

=  LAa  <g>  (LBb  <g)  Lcc) 

=  <t>((LAa  <g>  LBb)  <g)  Lcc) 

=  <t>((LA  <8>  LB)(a  <S>  b )  (g)  Lcc) 

=  (d>  o  ((La  ®  Lb)  <g>  Lc))((a  (g)  b)  (g)  c), 
and  the  proposition  follows.  □ 


The  treatment  of  Propositions  6.19  and  6.20  can  be  shortened  if  we  are  willing 
to  bypass  the  notion  of  a  triple  tensor  product  and  use  what  was  proved  about 
naturality  in  the  previous  section.  The  result  and  the  proof  are  as  follows. 

Proposition  6.20'.  Let  A,  B ,  and  C  be  vector  spaces  over  a  held  K.  Then 
there  is  an  isomorphism  d>  :  (A  <g)u  B)  <g>K  C  — >■  A  <g>K  (B  <g>ic  C)  that  is  natural 
in  the  triple  (A,  B,  C )  and  satisfies  d>(a  <g>  (b  <g>  c))  =  a  <g>  (b  <g>  c). 

PROOF.  Writing  =  for  “naturally  isomorphic  in  all  variables”  and  applying 
Proposition  6.15  and  other  natural  isomorphisms  of  the  previous  section  repeat¬ 
edly,  we  have 

Homjf  ((A  <g>K  B)  <g>K  C,  V)  =  Horng  (A  <g)K  B ,  HomK(C,  V)) 

=  HomK  (5,  HomK(A,  HomK(C,  V ))) 

=  HomK  (B,  HomK  (A  <g)K  C,  V)) 

=  HomK  (5,  HomK(C  <g)K  A,  V )) 

=  HomK  ((C  (g)K  B)  (g)K  A,  V)  by  symmetry 
=  HomK  (A  <g>K  (C  <g>K  B),V) 

=  HomK  (A  <g>K  (B  <g>K  C),  V). 
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Then  the  existence  of  the  natural  isomorphism  follows  from  Corollary  6.17.  Using 
the  explicit  formula  for  the  isomorphism  in  Proposition  6. 16  and  tracking  matters 
down,  we  see  that  d>(o  <g>  (b  <g>  c))  =  a  <g>  (b  <g>  c).  □ 

There  is  no  difficulty  in  generalizing  matters  to  //-fold  tensor  products  by 
induction.  An  //-fold  tensor  product  is  to  be  universal  for  n -multilinear  maps. 
Again  it  is  unique  up  to  canonical  isomorphism,  as  one  proves  by  an  argument 
that  runs  along  familiar  lines.  A  direct  construction  of  an  //-fold  tensor  product 
is  possible  in  the  style  of  the  proof  for  ordinary  tensor  products,  but  such  a 
construction  will  not  be  needed.  Instead,  we  can  form  an  //-fold  tensor  product 
as  the  in  —  l)-fold  tensor  product  of  the  first  n  —  1  spaces,  tensored  with  the  nth 
space.  Proposition  6.19b  allows  us  to  regroup  parentheses  (inductively)  in  any 
fashion  we  choose,  and  the  same  argument  as  in  Corollaries  6.1 1  and  6.12  yields 
the  following  proposition. 

Proposition  6.21.  If  E\, . . . ,  En,  and  V  are  vector  spaces  over  K,  then  the 
vector  space  HomK(£i  Ox  •  •  •  Ox  E„ ,  V)  is  canonically  isomorphic  (via  restric¬ 
tion  to  pure  tensors)  to  the  vector  space  of  all  V -valued  //-multilinear  functions 
on  Ei  x  •  •  •  x  En.  In  particular  the  vector  space  of  all  n -multilinear  forms  on 
E\  x  •  •  •  x  En  is  canonically  isomorphic  to  (£j  Ok  •  •  •  Ok  E, ,)'. 

Iterated  application  of  Proposition  6.20  shows  that  we  get  also  a  well-defined 
notion  of  a  linear  map  L\  O  •  •  •  O  Ln ,  the  tensor  product  of  //  linear  maps.  Thus 
{Ei  ,...,£„)  m*-  Ei  Ok  •  •  •  Ok  En  is  a  functor.  There  is  no  need  to  write  out  the 
details. 

We  turn  to  the  question  of  defining  a  multiplication  operation  on  tensors.  If  IK 
is  a  held,  an  algebra4  over  K  is  a  vector  space  V  over  K  with  a  multiplication 
or  product  operation  V  x  V  — »■  V  that  is  K  bilinear.  The  additive  part  of  the  K 
bilinearity  means  that  the  product  operation  satisfies  the  distributive  laws 

a(b  +  c)  =  ab  +  ac  and  (b  +  c)a  =  ba  +  ca  for  all  a,  b,  c  in  V, 
and  the  scalar-multiplication  part  of  the  K  bilinearity  means  that 

(ka)b  =  k(ab )  =  a  (kb)  for  all  k  in  K  and  a,  b  in  V. 

Within  the  text  of  the  book,  we  shall  work  mostly  just  with  associative 
algebras,  i.e.,  those  algebras  satisfying  the  usual  associative  law 

a(bc)  =  (ab)c  for  all  a,  b,  c  in  V. 

4Some  authors  use  the  term  “algebra"  to  mean  what  we  shall  call  an  "associative  algebra.” 


7.  Tensor  Algebra 


281 


An  associative  algebra  is  therefore  a  ring  and  a  vector  space,  the  scalar  multipli¬ 
cation  and  the  ring  multiplication  being  linked  by  the  requirement  that  (ka)b  = 
k(ab)  =  a  {kb)  for  all  scalars  k.  Some  commutative  examples  of  associative  alge¬ 
bras  over  K  are  any  field  L  containing  K,  the  polynomial  algebra  K[X i , . . . ,  Xn ] , 
and  the  algebra  of  all  K- valued  functions  on  a  nonempty  set  S.  Two  noncommu- 
tative  examples  of  associative  algebras  over  K  are  the  matrix  algebra  M„  (K) ,  with 
matrix  multiplication  as  its  product,  and  HomK(V,  V )  for  any  vector  space  V, 
with  composition  as  its  product.  The  division  ring  H  of  quaternions  (Example  10 
in  Section  IV.  1)  is  another  example  of  a  noncommutative  associative  algebra 
over  R. 

Despite  our  emphasis  on  algebras  that  are  associative,  certain  kinds  of  nonasso- 
ciative  algebras  are  of  great  importance  in  applications,  and  consequently  several 
problems  at  the  end  of  the  chapter  make  use  of  nonassociative  algebras.  A 
nonassociative  algebra  is  determined  by  its  vector-space  structure  and  the  mul¬ 
tiplication  table  for  the  members  of  a  K  basis.  There  is  no  restriction  on  the 
multiplication  table;  all  multiplication  tables  define  algebras.  Perhaps  the  best- 
known  nonassociative  algebra  is  the  3-dimensional  algebra  over  R  determined  by 
vector  product  in  R3.  A  basis  is  {i,  j,  k},  the  multiplication  operation  is  denoted 
by  x ,  and  the  multiplication  table  is 

i  x  i  =  0,  i  x  j  =  k,  i  x  k  =  — j, 

j  x  i  =  -k,  j  x  j  =  0,  j  x  k  =  i, 

kxi=j,  k  x  j  =  — i,  k  x  k  =  0. 

Since  i  x  (i  x  k)  =  i  x  (— j)  =  — k  and  (i  x  i)  x  k  =  0,  vector  product  is  not 

associative.  The  vector-product  algebra  is  a  special  case  of  a  Lie  algebra;  Lie 
algebras  are  defined  in  Problems  31-35  at  the  end  of  the  chapter. 

Tensor  algebras,  which  we  shall  now  construct,  will  be  associative  algebras. 
Lix  a  vector  space  E  over  K,  and  for  integers  n  >  1,  let  Tn{E)  be  the  n-fold 
tensor  product  of  E  with  itself.  In  the  case  n  =  0,  we  let  T°{E)  be  the  field  K. 
Define,  initially  as  a  vector  space,  T  (E)  to  be  the  direct  sum 

OO 

f(£)  =  ®m 

n= 0 

The  elements  that  lie  in  one  or  another  Tn(E )  are  called  homogeneous.  We 
define  a  bilinear  multiplication  on  homogeneous  elements 

Tm(E)  x  T"(E)  —  Tm+n(E) 
to  be  the  restriction  of  the  canonical  isomorphism 

Tm{E)  <g)K  Tn(E)  —  Tm+n{E) 
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resulting  from  iterating  Proposition  6.19b.  This  multiplication,  denoted  by  <g>,  is 
associative,  as  far  as  it  goes,  because  the  restriction  of  the  K  isomorphism 

T\E)  (: Tm(E )  Tn(E))  (T\E)  Tm(E ))  Tn(E) 

to  Tl{E )  x  ( Tm{E )  x  Tn(E))  factors  through  the  map 

T'(E)  x  (rm(£)  x  r'(£))  (Tl(E)  x  r"(£))  x  Tn(E ) 
given  by  (r,  (5,  t))  i->-  ((r,  5),  f). 

This  much  tells  how  to  multiply  homogeneous  elements  in  T(E).  Since  each 
element  t  in  T  ( E )  has  a  unique  expansion  as  a  finite  sum  t  =  YHl=d  *k  with 
tk  e  Tk(E),  we  can  define  the  product  of  this  t  and  the  element  t'  =  Yl'l= 0  lk  t0 
be  the  element  /  <g>  F  =  J2"=o  Hk+k'=i  (4  ®  ^);  the  expression  (4  <g>  t'k) 

is  the  component  of  the  product  in  T1  (E). 

Multiplication  is  thereby  well  defined  in  T{E),  and  it  satisfies  the  distributive 
laws  and  is  associative.  Thus  T (E)  becomes  an  associative  algebra  with  a 
(two-sided)  identity,  namely  the  element  1  in  T°(E).  In  the  presence  of  the 
identification  t  :  E  — »■  T 1  (E),  T (E)  is  known  as  the  tensor  algebra  of  E.  The 
pair  (T (E),  1)  has  the  universal  mapping  property  given  in  Proposition  6.22 
and  pictured  in  Figure  6.5. 


T(E) 

FIGURE  6.5.  University  mapping  property  of  a  tensor  algebra. 

Proposition  6.22.  The  pair  ( T(E ),  1)  has  the  following  universal  mapping 
property:  whenever  1  :  E  — >■  A  is  a  linear  map  from  E  into  an  associative  alge¬ 
bra  with  identity,  then  there  exists  a  unique  associative  algebra  homomorphism 
L  :  T (E)  — »■  A  with  L(l)  =  1  such  that  the  diagram  in  Figure  6.5  commutes. 

PROOF.  Uniqueness  is  clear,  since  E  and  1  generate  T  ( E )  as  an  algebra.  For 
existence  we  define  L1"1  on  T" (E)  to  be  the  linear  extension  of  the  //-multilinear 
map 

(Ul,  V2,  ...  ,  Vn)  l(Vi)l(v2)  '  ••/(«„), 

and  we  let  L  =  0  L(n)  in  obvious  notation.  Let  // 1  <g>  •  •  •  <g>  um  be  in  Tm(E)  and 
14  <g>  •  •  •  <g)  v„  be  in  Tn(E).  Then  we  have 

L(m)(u\  <g)  •  •  •  (g)  Um)  =  /(nf)  •  •  •/(«„), 

L(n\v  1  <g)  •  •  •  <g>  vn)  =  l(v 0  •  •  •/( vn), 

L(m+n){lt\  <g)  •  •  •  <g)  Um  <g)  14  <g)  •  •  •  <g)  Vn)  =  /(Ml  )  '  '  -l(um)l(Vl)  ■  ■  ■  1(V„). 
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Hence 

LUn)(u\ <8>-  •  i>i<g>-  •  -<8>vn)  =  L{m+n)(u\  <g>  ■  •  •  <g> m,„ <g> iq <g> •  •  -®vn). 

Taking  linear  combinations,  we  see  that  L  is  a  homomorphism.  □ 

Proposition  6.22  allows  us  to  make  E  T  ( E )  into  a  functor  from  the  category 
of  vector  spaces  over  K  to  the  category  of  associative  algebras  with  identity  over 
K.  To  carry  out  the  construction,  we  suppose  that  cp  :  E  — >  F  is  a  linear  map 
between  two  vector  spaces  over  K.  If  i  :  E  — >  T  ( E )  and  j  :  F  — »■  T  ( F)  are  the 
inclusion  maps,  then  jcp  is  a  linear  map  from  E  into  T ( F ),  and  Proposition  6.22 
produces  a  unique  algebra  homomorphism  <t>  :  T(E)  — >  T(F)  carrying  1  to  1 
and  satisfying  <b/  =  jtp.  Then  the  tensor-product  functor  is  defined  to  carry  the 
linear  map  cp  to  the  homomorphism  <E>  of  associative  algebras  with  identity. 

For  the  situation  in  which  R  is  a  commutative  ring  with  identity,  Section 
IV.5  introduced  the  ring  R[X  \, ...  ,X„\  of  polynomials  in  n  commuting  inde- 
terminates  with  coefficients  in  R.  This  ring  was  characterized  by  a  universal 
mapping  property  saying  that  if  a  ring  homomorphism  of  R  into  a  commutative 
ring  with  identity  were  given  and  if  n  elements  t\ , . . . ,  tn  were  given,  then  the 
ring  homomorphism  of  R  could  be  extended  uniquely  to  a  ring  homomorphism 
of  /?[Xi, . . . ,  Z„]  carrying  Xj  into  tj  for  each  j. 

Proposition  6.22  yields  a  noncommutative  version  of  this  result,  except  that  the 
ring  of  coefficients  is  assumed  this  time  to  be  a  field  K.  To  arrange  for  X\ , . . . ,  X„ 
to  be  noncommuting  inde terminates,  we  form  a  vector  space  with  {Zj ,  . . . .  Xn } 
as  a  basis.  Thus  we  let  E  =  ®J=1  K  Xj.  \ft\, ...  ,tn  are  arbitrary  elements  of  an 
associative  algebra  A  with  identity,  then  the  formulas  l(Xj)  =  tj  for  1  <  j  <  n 
define  a  linear  map  l  :  E  —*■  A.  The  associative-algebra  homomorphism 
L  :  T(E)  — »■  A  produced  by  the  proposition  extends  the  inclusion  of  K  into 
the  subheld  K1  of  A  and  carries  each  Xj  to  tj. 


8.  Symmetric  Algebra 

We  continue  to  allow  K  to  be  an  arbitrary  held.  Let  £  be  a  vector  space  over 
K,  and  let  T (E)  be  the  tensor  algebra.  We  begin  by  dehning  the  symmetric 
algebra  S(E).  This  is  to  be  a  version  of  T ( E )  in  which  the  elements,  which  are 
called  symmetric  tensors,  commute  with  one  another.  It  will  not  be  canonically 
an  algebra  of  polynomials,  as  we  shall  see  presently,  and  thus  we  make  no  use  of 
polynomial  rings  in  the  construction. 

Just  as  the  vector  space  of  n -multilinear  forms  E  x  •  •  •  x  E  — »■  K  is  canonically 
the  dual  of  T"  ( E ),  so  the  vector  space  of  symmetric  n -multilinear  forms  will  be 
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canonically  the  dual  of  Sn(E).  Here  “symmetric”  means  that  f (x \ .  . . . ,  xn )  = 
f(xT( i), ,  xT(n>)  for  every  permutation  r  in  the  symmetric  group 

Since  tensor  algebras  are  supposed  to  be  universal  devices  for  constructing 
associative  algebras  over  K,  whether  commutative  or  not,  we  seek  to  form  S(E) 
as  a  quotient  of  T(E).  If  q  is  the  quotient  homomorphism,  we  want  to  have 
q{u  <g>  u)  =  q(v  <g>  u)  in  S(E)  whenever  u  and  v  are  in  i(E)  =  Tl(E).  Hence 
every  element  u  <g>  v  —  v®  u  is  to  be  in  the  kernel  of  the  homomorphism.  On  the 
other  hand,  we  do  not  want  to  impose  any  unnecessary  conditions  on  our  quotient, 
and  so  we  factor  out  only  what  the  elements  u  ®  v  —  v<E>  u  force  us  to  factor  out. 
Thus  we  define  the  symmetric  algebra  by 

S(E)  =  T  (E)/I, 

(two-sided  ideal  generated  by  all 
u  ®  v  —  v  ®  u  with  u  and  v 
in  T\E ) 

Then  S(E)  is  an  associative  algebra  with  identity. 

Let  us  see  that  the  fact  that  the  generators  of  the  ideal  I  are  homogeneous 
elements  (all  being  in  T2(E ))  implies  that 

OO 

/  =  ®(/nrm 

n= 0 

In  fact,  each  I  fl  T"( E)  is  contained  in  /,  and  hence  I  contains  the  right  side. 
On  the  other  hand,  if  x  is  any  element  of  /,  then  x  is  a  sum  of  terms  of  the  form 
a  <8>  (u  <g>  v  —  ixg>  u)  <8>  b,  and  we  may  assume  that  each  a  and  b  is  homogeneous. 
Any  individual  term  a  <g>  (u  (g)  v  —  v  <g>  u)  0  h  is  in  some  /  D  T" ( E),  and  x  is 
exhibited  as  a  sum  of  members  of  the  various  intersections  I  0  T"  ( E  ). 

An  ideal  with  the  property  I  =  ®,^L0  (/  fl  Tn  (E))  is  said  to  be  homogeneous. 
Since  I  is  homogeneous, 


S(E)  =  0  T"(E)/(I  0  Tn(E)). 

n= 0 

We  write  Sn(E)  for  the  nth  summand  on  the  right  side,  so  that 

OO 

5(£)  =  ®S"(£). 

n= 0 

Since/nT'CE1)  =  0,  the  map  of  E  — >  Tl(E)  — >■  Sl(E)  into  first-order  elements 
is  one-one  onto.  The  product  operation  in  S(E)  is  written  without  a  product  sign. 
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the  image  in  Sn  ( E )  of  iq  <g>  •  •  •  <g>  t>„  in  Tn  (E)  being  written  as  u  i  ■  ■  ■  v„.  If  a  is  in 
Sm(E )  and  b  is  in  S" (E).  then  ab  is  in  Sm+n(E).  Moreover,  Sn(E)  is  generated 
by  elements  v\  •  •  •  vn  with  all  Vj  in  Sl(E)  =  E,  since  Tn(E)  is  generated  by 
corresponding  elements  v\  <g>  •  •  •  <g>  vn.  The  defining  relations  for  S(  E )  make 
Vi  Vj  =  VjVj  for  Vi  and  Vj  in  Sl(E),  and  it  follows  that  the  associative  algebra 
S(E)  is  commutative.  □ 

Proposition  6.23.  Let  £  be  a  vector  space  over  the  held  K. 

(a)  Let  /  be  the  n-multilinear  function  i(tq , . . . ,  vn)  =  Vi  ■  ■  ■  vn  of  E  x  •  •  •  x  E 
into  S"(E).  Then  ( S"  (E).  i)  has  the  following  universal  mapping  property: 
whenever  /  is  any  symmetric  n -multilinear  map  of  E  x  •  •  •  x  E  into  a  vector 
space  U ,  then  there  exists  a  unique  linear  map  L  :  Sn(E)  — >  JJ  such  that  the 
diagram 

E  x  •  •  •  x  E  — l—*  U 
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'  /  L 

Sn{E ) 


commutes. 

(b)  Let  t  be  the  one-one  linear  function  that  embeds  E  as  Sl{E)  c  S{E). 
Then  ( S(E ),  i )  has  the  following  universal  mapping  property:  whenever  /  is 
any  linear  map  of  E  into  a  commutative  associative  algebra  A  with  identity,  then 
there  exists  a  unique  algebra  homomorphism  L  :  S(E)  — »■  A  with  L(\  )  =  1  such 
that  the  diagram 


i  /L 

S(E) 


commutes. 

PROOF.  In  both  cases  uniqueness  is  trivial.  For  existence  we  use  the  universal 
mapping  properties  of  T"(E)  and  T ( E )  to  produce  L  on  T"(E)  or  7  (E).  If  we 
can  show  that  L  annihilates  the  appropriate  subspace  so  as  to  descend  to  Sn  (E) 
or  S(E),  then  the  resulting  map  can  be  taken  as  L,  and  we  are  done.  For  (a),  we 
have  L  :  Tn(E)  — »■  U ,  and  we  are  to  show  that  L(Tn(E )  D  /)  =  0,  where  I  is 
generated  by  all  u  <g>  v  —  v  <g>  u  with  u  and  v  in  T1  (E).  A  member  of  Tn  ( E)  H  I 
is  thus  of  the  form  Ylai  ®  (ui  ®  vi  ~  vi  ®  11  i)  ®  h  with  each  term  in  Tn{E). 
Each  term  here  is  a  sum  of  pure  tensors 

jci<8>-  •  -®xr®Ui®Vi®y i®-  •  -®ys—  xi<8>-  •  ■®xr®Vi®Uj®y\®-  ■  -®ys  (*) 
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with  r  +  2  +  s  =  n.  Since  1  by  assumption  takes  equal  values  on 


x\  x  •  •  •  x  xr  x  it,  x  Vi  x  ji  x  •  •  •  x  ys 
and  x\  x  ■■■  x  xr  x  v,-  x  iij  x  y\  x  •  •  •  x  ys, 

L  vanishes  on  (*),  and  it  follows  that  L(Tn(E)  fl  7)  =  0. 

For  (b)  we  are  to  show  that  L  :  T(E)  — »■  A  vanishes  on  7.  Since  kerL 
is  an  ideal,  it  is  enough  to  check  that  L  vanishes  on  the  generators  of  7.  But 
L(u  <g>  v  —  v  <g>«)  =  /(«)/ ( v )  —  l(v)l (u)  =  0  by  the  commutativity  of  A,  and  thus 
L(7)  =  0.  □ 

Corollary  6.24.  If  E  and  F  are  vector  spaces  over  the  field  K,  then  the 
vector  space  Homij(S"(7s),  F)  is  canonically  isomorphic  (via  restriction  to  pure 
tensors)  to  the  vector  space  of  all  77-valued  symmetric  n -multilinear  functions  on 
E  x  •  •  •  x  E. 

PROOF.  Restriction  is  linear  and  one-one.  It  is  onto  by  Proposition  6.23a.  □ 

Corollary  6.25.  If  7s  is  a  vector  space  over  the  field  K,  then  the  dual  ( Sn{E))' 
of  Sn(E)  is  canonically  isomorphic  (via  restriction  to  pure  tensors)  to  the  vector 
space  of  symmetric  n -multilinear  forms  on  E  x  •  •  •  x  7s . 

Proof.  This  is  a  special  case  of  Corollary  6.24.  □ 

If  <p  :  E  — »■  F  is  a  linear  map  between  vector  spaces,  then  we  can  use 
Proposition  6.23b  to  define  a  corresponding  homomorphism  <t>  :  S(  E )  — >  S(F) 
of  associative  algebras  with  identity.  In  this  way,  we  can  make  7s  h-»  .S’ (7s)  into  a 
functor  from  the  category  of  vector  spaces  over  K  to  the  category  of  commutative 
associative  algebras  with  identity  over  K.  The  details  appear  in  Problem  14  at 
the  end  of  the  chapter. 

Next  we  shall  identify  a  basis  for  Sn(E )  as  a  vector  space.  The  union  of  such 
bases  as  n  varies  will  then  be  a  basis  of  S(E).  Let  {//,  }/6  i  be  a  basis  of  7s,  possibly 
infinite.  As  noted  in  Section  A5  of  the  appendix,  a  simple  ordering  on  the  index 
set  A  is  a  partial  ordering  in  which  every  pair  of  elements  is  comparable  and  in 
which  a  <  b  and  b  <  a  together  imply  a  =  b. 

Proposition  6.26.  Let  7s  be  a  vector  space  over  the  field  K,  let  {m,  };sa  be  a 
basis  of  E,  and  suppose  that  a  simple  ordering  has  been  imposed  on  the  index  set 
A.  Then  the  set  of  all  monomials  u  j'  ■  ■  ■  u  fk  with  i\  <  •  •  •  <  4  and  ]C;)I  jm  =  n 
is  a  basis  of  Sn(E). 
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Remark.  In  particular  if  E  is  finite-dimensional  with  {u\, ...  ,un)  as  an 
ordered  basis,  then  the  monomials  ti\'  ■  ■  ■  u  ^  of  total  degree  n  form  a  basis  of 
Sn(E). 

PROOF.  Since  S  (E)  is  commutative  and  since  //-fold  products  of  elements /(«,) 
in  TX(E)  span  T”(E),  the  indicated  set  of  monomials  spans  Sn(E).  Let  us  see  that 
the  set  is  linearly  independent.  Take  any  finite  subset  F  C  A  of  indices.  The  map 
Y2i€A  CjUi  i— >  CjXj  of  E  into  the  polynomial  algebra  K[ (A,  },£/.■  |  is  linear 

into  a  commutative  algebra  with  identity.  Its  extension  via  Proposition  6.23b  maps 
all  monomials  in  the  it ,  for  i  e  F  into  distinct  monomials  in  K[  { X,  }(g/;  |.  which 
are  necessarily  linearly  independent.  Hence  any  finite  subset  of  the  monomials  in 
the  statement  of  the  proposition  is  linearly  independent,  and  the  whole  set  must 
be  linearly  independent.  Therefore  our  spanning  set  is  a  basis.  □ 

The  proof  of  Proposition  6.26  shows  that  S(E)  may  be  identified  with  poly¬ 
nomials  in  indeterminates  identified  with  members  of  E  once  a  basis  has  been 
chosen,  but  this  identification  depends  on  the  choice  of  basis.  Indeed,  if  we  think 
of  E  as  specified  in  advance,  then  the  isomorphism  was  set  up  by  mapping  the  set 
to  the  specified  basis  of  E.  and  the  result  certainly  depended  on  what  basis 
was  used.  Nevertheless,  if  E  is  finite-dimensional,  there  is  still  an  isomorphism 
that  is  independent  of  basis;  it  is  between  S(E'),  where  E'  is  the  dual  of  E.  and 
a  natural  basis-free  notion  of  “polynomials”  on  E.  We  return  to  this  point  after 
one  application  of  Proposition  6.26. 

Corollary  6.27.  Let  £  be  a  finite-dimensional  vector  space  over  K  of  dimen¬ 
sion  N.  Then 

(a)  dimS"(£)  =  ^  ^  for  ()<//<  oo, 

(b)  S”  ( E’ )  is  canonically  isomorphic  to  Sn  ( E )'  in  such  a  way  that 


n 

(/l  •  •  •  fn) Ol  •  •  •  W„)  =  rn  fj(wr(j )))> 

reS„  y=I 


for  any  in  E'  and  any  w  i , . . . ,  wn  in  E,  provided  K  has 

characteristic  0;  here  &n  is  the  symmetric  group  on  n  letters. 

PROOF.  For  (a),  a  basis  has  been  described  in  Proposition  6.26.  To  see  its 
cardinality,  we  recognize  that  picking  out  N  —  1  objects  from  //  +  N  —  1  to  label 
as  dividers  is  a  way  of  assigning  exponents  to  the  Uj’s  in  an  ordered  basis;  thus 

the  cardinality  of  the  indicated  basis  is 


n  +  N -  1  \ 

n — i  y 
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For  (b),  let  /i , . . . ,  /„  be  in  E'  and  vo\ , . . . ,  w„  be  in  E,  and  define 

n 

■■■,  Wn)  =  En  //OrO')))- 

rsS„  7  =  1 


Then  is  symmetric  //-multilinear  from  E  x  •  •  •  x  E  into  K  and  extends 

by  Proposition  6.23a  to  a  linear  :  S"(E )  — >  K.  Thus  l(fi, . . . ,  /„)  = 

Lft . fn  defines  a  symmetric  n -multilinear  map  of  E'  x  •  •  •  x  E'  into  Sn(E)' .  Its 

linear  extension  L  maps  Sn(E ')  into  Sn(E)' . 

To  complete  the  proof,  we  shall  show  that  L  carries  basis  to  basis.  Let 
U\, . . . ,  M/v  be  an  ordered  basis  of  E,  and  let  u[, ,  u'N  be  the  dual  basis.  Part 
(a)  shows  that  the  elements  (u\  )J1  •  •  •  (u'n)^n  with  jm  =  n  form  a  basis  of 
Sn(E' )  and  that  the  elements  (it  \  /'  ■  ■  ■  ( UN)kN  with  km  =  n  form  a  basis  of 
Sn(E).  We  show  that  L  of  the  basis  of  Sn(E’ )  is  the  dual  basis  of  the  basis  of 
S"(E ),  except  for  positive-integer  factors.  Thus  let  all  of  /i, ... ,  //,  be  u\ ,  let 
all  of  ... ,  fjt+j2  be  u’2,  and  so  on.  Similarly  let  all  of  id i, ... ,  w ^  be  u i , 
let  all  of  uttj+i , . . . ,  u>k,+k2  be  it 2.  and  so  on.  Then 

mi\r  •  •  •  wNyNmix)k'  •  •  •  («*)**)  =  l(/i  •  •  •  fnxwi  ■  ■  ■  w„) 

=  l(fU  fn)(u>]  ■  ■  ■  Wn) 

n 

T€S „  (  =  1 


For  given  r ,  the  product  on  the  right  side  is  0  unless,  for  each  index  i ,  an  inequality 
jm-\  +  1  <  i  <  jm  implies  that  km _  1  +  1  <  r  (i)  <  km.  In  this  case  the  product 
is  1;  so  the  right  side  counts  the  number  of  such  r’s.  For  given  r,  obtaining  a 
nonzero  product  forces  km  =  jm  for  all  m .  And  when  km  =  jm  for  all  m ,  the 
choice  r  =  1  does  lead  to  product  1 .  Hence  the  members  of  L  of  the  basis  are 
positive-integer  multiples  of  the  members  of  the  dual  basis,  as  asserted.  □ 

Let  us  return  to  the  question  of  introducing  a  basis-free  notion  of  polynomials 
on  the  vector  space  E  under  the  assumption  that  E  is  Unite-dimensional.  We  take 
a  cue  from  Corollary  4.32,  which  tells  us  that  the  evaluation  homomorphism 
carrying  K[Xi, . . . ,  X„]  to  the  algebra  of  K- valued  polynomial  functions  of 
(ti, ...  ,t„)  is  one-one  if  K  is  an  infinite  field.  We  regard  the  latter  as  the  algebra 
of  polynomial  functions  on  K",  and  we  check  what  happens  when  we  identify 
the  vector  space  E  with  K"  by  fixing  a  basis.  Let  F  =  {x\, . . . ,  x„}  be  a  basis  of 
E,  and  let  T'  =  [x[, . . . ,  x’n)  be  the  dual  basis  of  E’.  If  e  =  t\x \  +  •  •  •  +  tnxn  is 
the  expansion  of  a  member  of  E  in  terms  of  F ,  then  we  have  xj  (e)  =  tj .  Thus  the 
polynomial  functions  tj  are  given  by  the  members  of  the  dual  basis.  The  vector 
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space  of  all  homogeneous  first-degree  polynomial  functions  is  the  set  of  linear 
combinations  of  the  tf  s,  and  these  are  given  by  arbitrary  linear  functionals  on  E. 
Thus  the  vector  space  of  homogeneous  first-degree  polynomial  functions  on  E  is 
just  the  dual  space  E' ,  and  this  conclusion  does  not  depend  on  the  choice  of  basis. 
The  algebra  of  all  polynomial  functions  on  E  is  then  the  algebra  of  all  K- valued 
functions  on  E  generated  by  E’  and  the  constant  functions. 

This  discussion  tells  us  unambiguously  what  polynomial  functions  on  E  are 
to  be,  and  we  want  to  backtrack  to  handle  abstract  polynomials  on  E.  Although 
the  evaluation  homomorphism  from  K[Xi , . . . ,  Xn]  to  the  algebra  of  polynomial 
functions  on  K"  may  fail  to  be  one-one  if  K  is  a  finite  field,  its  restriction  to 
homogeneous  first-degree  polynomials  is  one-one.  Thus,  whatever  we  might 
mean  by  the  vector  space  of  homogeneous  first-degree  polynomials  on  E,  the 
evaluation  mapping  should  exhibit  this  space  as  isomorphic  to  E'. 

Armed  with  these  clues,  we  define  the  polynomial  algebra  P{E )  on  E  to  be 
the  symmetric  algebra  S{E')  if  E  is  finite-dimensional.  We  need  an  evaluation 
mapping  for  each  point  e  of  E,  and  we  obtain  this  from  the  universal  mapping 
property  of  symmetric  algebras  (Proposition  6.23b):  With  e  fixed,  we  have  a 
linear  map  1  from  the  vector  space  E'  to  the  commutative  associative  algebra 
K  given  with  l(e')  =  e'(e).  The  universal  mapping  property  gives  us  a  unique 
algebra  homomorphism  L  :  S  ( E ')  — »■  K  that  extends  /  and  carries  1  to  1 .  The 
algebra  homomorphism  L  is  then  a  multiplicative  linear  functional  on  P(E)  = 
S(E')  that  carries  1  to  1  and  agrees  with  evaluation  at  e  on  homogeneous  first- 
degree  polynomials.  We  write  this  homomorphism  as  p  i->  p(e),  and  we  define 
P"(E )  =  Sn(E')\ this  is  the  vector  space  of  homogeneous  77  th -degree  polynomials 
on  E.  A  confirmation  that  P  ( E )  is  indeed  to  be  regarded  as  the  algebra  of  abstract 
polynomials  on  E  comes  from  the  following. 

Proposition  6.28.  If  £  is  a  finite-dimensional  vector  space  over  the  field 
K,  then  the  system  of  evaluation  homomorphisms  P(E)  ->  Kon  polynomials 
given  by  p  m*  {p(e)\eG  f  is  an  algebra  homomorphism  of  P(E)  onto  the  algebra 
of  K- valued  polynomial  functions  on  E  that  carries  the  identity  to  the  constant 
function  1 ,  and  it  is  one-one  if  K  is  an  infinite  field. 

PROOF.  Certainly  p  i->  {p(e)}e&E  is  an  algebra  homomorphism  of  P(E)  into 
the  algebra  of  K- valued  polynomial  functions  on  E,  and  it  carries  the  identity  to 
the  constant  function  1 .  We  have  seen  that  the  image  of  Px  (E)  is  exactly  E',  and 
hence  the  image  of  P(E)  is  the  algebra  of  K- valued  functions  on  E  generated 
by  E'  and  the  constants.  This  is  exactly  the  algebra  of  all  K- valued  polynomial 
functions,  and  hence  the  mapping  is  onto. 

Suppose  that  K  is  infinite.  The  restriction  of  p  i->  { pie )\ <■&/■:  to  the  finite¬ 
dimensional  subspace  Pn(E)  of  P(  E  )  maps  into  the  finite-dimensional  subspace 
of  all  polynomial  functions  on  E  homogeneous  of  degree  n,  and  this  restriction 
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must  therefore  be  onto.  We  can  read  off  the  dimension  of  the  space  of  all 
polynomial  functions  on  E  homogeneous  of  degree  n  from  Corollary  4.32  and 
Corollary  6.27 a.  This  dimension  matches  the  dimension  of  Pn(E),  according  to 
Corollary  6.27a.  Since  the  mapping  is  onto  and  the  finite  dimensions  match,  the 
restricted  mapping  is  one-one.  Hence  p  h->  {p(e)}eSE  is  one-one.  □ 

We  have  defined  the  symmetric  algebra  S{E)  as  a  quotient  of  the  tensor  algebra 
T (E).  Now  let  us  suppose  that  K  has  characteristic  0.  With  this  hypothesis  we 
shall  be  able  to  identify  an  explicit  vector  subspace  of  T(E)  that  maps  one-one 
onto  S ( E )  during  the  passage  to  the  quotient.  This  subspace  of  T(E)  can  therefore 
be  viewed  as  a  version  of  S(E)  for  some  purposes. 

We  define  an  //-multilinear  function  from  E  x  •  •  •  x  E  into  T"(E )  by 

1  x  - 

(tq, . . . ,  1//7)  1  >  -  y  '  ©  •  •  •  © 

'  tsS„ 

and  let  a  :  T'\E)  — »■  T”(E)  be  its  linear  extension.  We  call  a  the  symmetrizer 
operator.  The  image  of  a  in  T  ( E )  is  denoted  by  Sn(E),  and  the  members  of  this 
subspace  are  called  symmetrized  tensors. 

Proposition  6.29.  Let  the  field  K  have  characteristic  0,  and  let  £  be  a  vector 
space  over  K.  Then  the  symmetrizer  operator  a  satisfies  a2  =  a .  The  kernel  of 
a  on  Tn(E )  is  exactly  T"(E)  Cl  /,  and  therefore 

Tn(E)  =  Sn(E )  ©  (Tn(E)  n  /). 


Remark.  In  view  of  this  corollary,  the  quotient  map  Tn(E )  — >  Sn(E)  carries 
S"(E)  one-one  onto  S"(E).  Thus  Sn(E )  can  be  viewed  as  a  copy  of  S"(E ) 
embedded  as  a  direct  summand  of  Tn(E). 

Proof.  We  have 


cr2(v  1  <g>  •  •  •  <8>  vn) 


1  \2  vptW  ®  ®  vp*W 

’  p.r  se„ 

Lh2  ®  ®  u®(") 

^  '  pz&n  CO€&n, 

{(0=pT) 


1 

n\ 


Y\  Or(Ui  ©  •  •  •  <8>  Vn) 
pe&„ 


=  a(v  1  ©  •  •  •  ©  v„). 
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Hence  cr2  =  a.  Thus  a  fixes  any  member  of  image cr,  and  it  follows  that 
image  a  D  ker  a  =  0.  Consequently  T"(E )  is  the  direct  sum  of  image  cr  and 
ker  cr.  We  are  left  with  identifying  ker  a  as  Tn(E )  D  I. 

The  subspace  T'\E)  n  /  is  spanned  by  elements 


x\  <g>  •  •  •  <g>  xr  <S>  u  <g>  ixg)  y\  <g>  ■  ■  •  <g>  ys  —  x\  <g>  ■  ■  ■  <S>  xr  <S>  v  <S>  u  <g>  y\  <g>  ■  ■  ■  <S>  ys 


with  r  +  2  +  s  =  n,  and  the  symmetrizer  a  certainly  vanishes  on  such  elements. 
Hence  T"(E )  fl  I  c  kercr.  Suppose  that  the  inclusion  is  strict,  say  with  t  in 
ker  cr  but  t  not  in  Tn(E)  fl  I.  Let  q  be  the  quotient  map  Tn(E)  — >  S"(E). 
The  kernel  of  q  is  Tn{E)  fl  /,  and  thus  q(t)  /  0.  From  Proposition  6.26  the 
T{E)  monomials  in  basis  elements  from  E  with  increasing  indices  map  onto  a 
basis  of  S(E).  Since  K  has  characteristic  0,  the  symmetrized  versions  of  these 
monomials  map  to  nonzero  multiples  of  the  images  of  the  initial  monomials. 
Consequently  q  carries  Sn(E )  =  imagea  onto  Sn(E).  Thus  choose  t'  e  S'\E) 
with  q{t')  =  q{t).  Then  t'  —  t  is  in  kerg  =  Tn(E )  fl  I  C  ker  cr.  Since  cr(t)  =  0, 
we  see  that  cr  (t ')  =  0.  Consequently  t  is  in  ker  a  fl  image  cr  =  0,  and  we  obtain 
t'  =  0  and  q(t)  =  q{t')  =  0,  contradiction.  □ 


9.  Exterior  Algebra 

We  turn  to  a  discussion  of  the  exterior  algebra.  Let  K  be  an  arbitrary  field,  and 
let  £  be  a  vector  space  over  K.  The  construction,  results,  and  proofs  for  the 
exterior  algebra  /\(E)  are  similar  to  those  for  the  symmetric  algebra  S(E).  The 
elements  of  /\ (E)  are  to  be  all  the  alternating  tensors  (=  skew-symmetric  if  IK 
has  characteristic  ^  2),  and  so  we  want  to  force  v  <8>  v  =  0.  Thus  we  define  the 
exterior  algebra  by 


where 


A  (E)  =  T{E)/V, 


/  two-sided  ideal  generated  by  all 
y  v  <8>  v  with  v  in  TX(E) 


Then  /\(E)  is  an  associative  algebra  with  identity. 

It  is  clear  that  I'  is  homogeneous:  I'  =  ®A=o  ®  n  Tn(E)).  Thus  we  can 
write 

A (E)  =  ©“  o  Tn(E)/(I'  n  Tn(E)). 

We  write  /\n(E)  for  the  nth  summand  on  the  right  side,  so  that 


A(E)  =  ®ZoA"(E). 
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Since  I'  D  TUE)  =  0,  the  map  of  E  into  first-order  elements  f\l  (E)  is  one-one 
onto.  The  product  operation  in  /\(E)  is  denoted  by  A  rather  than  <g>,  the  image  in 
f\"(E  )  of  V\  <g>  ■  ■  ■  vn  in  Tn(E )  being  denoted  by  iq  A  •  •  •  A  vn.  If  a  is  in  /\m(E) 
and  b  is  in  / \n(E ),  then  a  A  b  is  in  /\”+n(E).  Moreover,  /\"{E)  is  generated 
by  elements  iq  A  •  •  •  A  vn  with  all  Vj  in  /\l (E)  =  E,  since  Tn{E)  is  generated 
by  corresponding  elements  v\  ®  ®  vn.  The  defining  relations  for  f\(E)  make 

Vj  A  Vj  =  —Vj  A  Vj  for  v,  and  vj  in  /\‘ (E),  and  it  follows  that 

a  A  b  =  (— 1  )mnb  A  a  for  a  e  / \m(E )  and  b  €  f\n(E). 

Proposition  6.30.  Let  £  be  a  vector  space  over  the  held  K. 
(a)Letrbethe/7-multilinearfunctioni(i)i, . . . ,  v„)  =  v\A-  ■  -  An,,  of  Ex  -  ■  -  xE 
into  /\ "(£).  Then  (/\" (E).  i)  has  the  following  universal  mapping  property: 
whenever  l  is  any  alternating  //-multilinear  map  of  E  x  •  •  •  x  E  into  a  vector  space 
U,  then  there  exists  a  unique  linear  map  L  :  /\n(E)  — >  U  such  that  the  diagram 

E  x  •  •  •  x  E  — U 

n 

U  ''' L 

A  n(E) 


commutes. 

(b)  Let  r  be  the  function  that  embeds  E  as  f\l  (E)  c  A(^)-  Then  (f\(E),  i ) 
has  the  following  universal  mapping  property:  whenever  l  is  any  linear  map  of 
E  into  an  associative  algebra  A  with  identity  such  that  /( i>)2  =  0  for  all  v  e  E, 
then  there  exists  a  unique  algebra  homomorphism  L  :  f\(E)  -a-  A  with  L(\  )  =  1 
such  that  the  diagram 


i 

ME)  ' 


commutes. 

PROOF.  The  proof  is  completely  analogous  to  the  proof  of  Proposition  6.23.  □ 

Corollary  6.31.  If  E  and  F  are  vector  spaces  over  the  held  K,  then  the 
vector  space  Hom^CA"  (£)•  F)  is  canonically  isomorphic  (via  restriction  to  pure 
tensors)  to  the  vector  space  of  all  /-’-valued  alternating  //-multilinear  functions  on 
E  x  •  •  •  x  E. 

PROOF.  Restriction  is  linear  and  one-one.  It  is  onto  by  Proposition  6.30a.  □ 
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Corollary  6.32.  If  £  is  a  vector  space  over  the  field  K,  then  the  dual  (/\"  (E))' 
of  /\n(E)  is  canonically  isomorphic  (via  restriction  to  pure  tensors)  to  the  vector 
space  of  alternating  n -multilinear  forms  on  E  x  •  •  •  x  E. 

Proof.  This  is  a  special  case  of  Corollary  6.31.  □ 

If  <p  :  E  — >  F  is  a  linear  map  between  vector  spaces,  then  we  can  use 
Proposition  6.30b  to  define  a  corresponding  homomorphism  :  f\(  E)  — >  f\(E) 
of  associative  algebras  with  identity.  In  this  way,  we  can  make  E  i->  / \(£ )  into  a 
functor  from  the  category  of  vector  spaces  over  K  to  the  category  of  commutative 
associative  algebras  with  identity  over  K.  We  omit  the  details,  which  are  similar 
to  those  for  symmetric  tensors. 

Next  we  shall  identify  a  basis  for  /\"  ( E )  as  a  vector  space.  The  union  of  such 
bases  as  n  varies  will  then  be  a  basis  of  f\(E). 

Proposition  6.33.  Let  £  be  a  vector  space  over  the  held  K,  let  {m,  },sa  be  a 
basis  of  E,  and  suppose  that  a  simple  ordering  has  been  imposed  on  the  index  set 
A.  Then  the  set  of  all  monomials  u ,  A  •  •  •  A  mn  with  /'i  <•••</„  is  a  basis  of 

A  "(E). 

PROOF.  Since  multiplication  in  / \(£ )  satisfies  a  A  b  =  (—  I )"’"/?  A  a  for 
a  e  A "'(E)  and  b  e  f \n(E )  and  since  monomials  span  T" (£),  the  indicated  set 
spans  A "(E).  Let  us  see  that  the  set  is  linearly  independent.  For  i  e  A ,  let  w-  be 
the  member  of  E'  with  m  -  (uj)  equal  to  1  for  j  =  i  and  equal  to  0  for  j  /  i.  Fix 
r\  <  ■  ■  ■  <  rn,  and  dehne 

l(w i, . . . ,  wn)  =  det {u1  (Wj)}  for  uj i , ... ,  wn  in  E. 

Then  l  is  alternating  n -multilinear  from  E  x  •  •  •  x  E  into  K  and  extends  by 
Proposition  6.30a  to  L  :  /\n(E)  — »■  K.  If  k\  <  •  •  •  <  kn,  then 

L(ukl  A  •  •  •  A  ukn)  =  l(ukl, ukJ  =  det  {u'r.(ukj)}, 

and  the  right  side  is  0  unless  r\  =  k\, ,  rn  =  kn,  in  which  case  it  is  1.  This 
proves  that  the  un  A  •  •  •  A  urn  are  linearly  independent  in  f\n  ( E).  □ 

Corollary  6.34.  Let  £  be  a  hnite-dimensional  vector  space  over  K  of  dimen¬ 
sion  N.  Then 

(a)  dim  /\"  (£)  =  ^^^forO<n<N  and  =  0  for  n  >  N, 

(b)  A n(E')  is  canonically  isomorphic  to  A "(£)/  by 


(/t  A  •  •  •  A  fn){w  1,  ...,wn)  =  det {fi(Wj)}. 
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PROOF.  Part  (a)  is  an  immediate  consequence  of  Proposition  6.33,  and  (b)  is 
proved  in  the  same  way  as  Corollary  6.27b,  using  Proposition  6.30a  as  a  tool.  The 
“positive-integer  multiples”  that  arise  in  the  proof  of  Corollary  6.27b  are  all  1  in 
the  current  proof,  and  hence  no  restriction  on  the  characteristic  of  IK  is  needed.  □ 

Now  let  us  suppose  that  IK  has  characteristic  0.  We  define  an  //-multilinear 
function  from  E  x  •  •  •  x  E  into  Tn(E)  by 

1  \  - 

(vu  vn)  —  >  (sgnr)i/T(i)  <g>  •  •  •  <g>  vT(n), 
n!  “ 
refc>„ 

and  let  o'  :  Tn(E )  — »■  Tn(E )  be  its  linear  extension.  We  call  o’  the  antisym- 
metrizer  operator.  The  image  of  o'  in  T{E)  is  denoted  by  /\  (E),  and  the 
members  of  this  subspace  are  called  antisymmetrized  tensors. 

Proposition  6.35.  Let  the  field  IK  have  characteristic  0,  and  let  £  be  a  vector 
space  over  IK.  Then  the  antisymmetrizer  operator  o'  satisfies  o' 2  =  o' .  The 
kernel  of  o'  on  Tn(E)  is  exactly  Tn(E)  fl  /',  and  therefore 

T"(E)  =  /\'(E)  ©  (Tn(E)  n  /'). 

Remark.  In  view  of  this  corollary,  the  quotient  map  Tn(E)  — >  f\"  ( E )  carries 
/\  (E)  one-one  onto  /\"(E).  Thus  /\  (E)  can  be  viewed  as  a  copy  of  f\"(E) 
embedded  as  a  direct  summand  of  Tn(E). 

Proof.  We  have 

o'2(v  1  <g>  •  •  •  <g>  v„)  = 


1  \  - 

— 2  2^  (sgnpT)vpT(1)  <8>  •  •  •  <8>  vpr(n) 

’  P,re&„ 

yry  X!  X!  (sgn«)^(i)  ®  ■  •  •  <8>  vwin) 


(«!) 

1 

(«!)2 


(<w=pr) 


1  x  -  , 

—  >  cr  (vi  (8)  •  •  •  (8)  vn) 
n\  ^ 

P  e&„ 

o'(v  1  <g>  •  •  •  <g>  v„). 


Hence  o'2  =  o'.  Consequently  Tn(E)  is  the  direct  sum  of  image  cr'  and  ker  cr', 
and  we  are  left  with  identifying  ker  cr'  as  Tn(E )  Pi  I'. 

The  subspace  Tn(E)  fl  /'  is  spanned  by  elements 


xi  <g>  •  •  •  <g>  xr  <g>  v  <g>  u<g>  yi  <g>  •  •  •  <g>  y.s 
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with  r +2+5  =  n,  and  the  antisymmetrizer  o'  certainly  vanishes  on  such  elements. 
Hence  Tn(E)  PI  I'  C  ker  cr'.  Suppose  that  the  inclusion  is  strict,  say  with  t  in 
kera'  but  t  not  in  Tn(E)  fl  I'.  Let  q  be  the  quotient  map  Tn(E)  — >  /\"(E).  The 
kernel  of  q  is  Tn(E )  fl  /',  and  thus  q(t)  =£  0.  From  Proposition  6.33  the  T (E) 
monomials  with  strictly  increasing  indices  map  onto  a  basis  of  /\(E).  Since  K 
has  characteristic  0,  the  antisymmetrized  versions  of  these  monomials  map  to 
nonzero  multiples  of  the  images  of  the  initial  monomials.  Consequently  q  carries 
/\  (E)  =  imagecr'  onto  /\n(E).  Thus  choose  t'  e  /\  (E)  with  q(t')  =  q{t). 
Then  t'  —  t  is  in  kerg  =  T"{E)  Cl  /'  C  ker  a'.  Since  cr'(t)  =  0,  we  see  that 
a'(t')  =  0.  Consequently  t'  is  in  kercr'  Cl  image  a'  =  0,  and  we  obtain  t'  =  0 
and  q{t)  =  q(t')  =  0,  contradiction.  □ 


10.  Problems 

1 .  Let  V  be  a  vector  space  over  a  field  K,  and  let  ( • ,  • )  be  a  nondegenerate  bilinear 
form  on  V. 

(a)  Prove  that  every  member  v'  of  V  is  of  the  form  v'(w)  —  ( v ,  w)  for  one  and 
only  one  member  v  of  V. 

(b)  Suppose  that  ( • ,  • )  is  another  bilinear  form  on  V.  Prove  that  there  is  some 
linear  function  L  :  V  — »•  V  such  that  (v,  w)  =  ( L(v ),  w )  for  all  v  and  w 
in  V. 

2.  The  matrix  A  —  ^  ®  ^  with  entries  in  F2  is  symmetric.  Prove  that  there  is  no 
nonsingular  M  with  M'  AM  diagonal. 

3.  This  problem  shows  that  one  possible  generalization  of  Sylvester’s  Law  to  other 
fields  is  not  valid.  Over  the  field  F3,  show  that  there  is  a  nonsingular  matrix 
M  such  that  ^  ^  °  ^  =  Mr  ^  ^  M.  Conclude  that  the  number  of  squares  in 

Kx  among  the  diagonal  entries  of  the  diagonal  form  in  Theorem  6.5  is  not  an 
invariant  of  the  symmetric  matrix. 

4.  Let  V  be  a  complex  n-dimensional  vector  space,  let  ( • ,  • )  be  a  Hermitian  form  on 
V ,  let  Vr  be  the  2n -dimensional  real  vector  space  obtained  from  V  by  restricting 
scalar  multiplication  to  real  scalars,  and  define  ( • ,  • )  =  lm(  • ,  • ).  Prove  that 

(a)  ( • ,  • }  is  an  alternating  bilinear  form  on  Pr, 

(b)  (/(i>i),  J(v 2)}  =  (ui,  i>2>  for  all  t’i  and  i>2  if  J  :  Pr  — >  Pr  is  what 
multiplication  by  i  becomes  when  viewed  as  a  linear  map  from  Pr  to  itself, 

(c)  ( • ,  • )  is  nondegenerate  on  Pr  if  and  only  if  ( ■ ,  • )  is  nondegenerate  on  P. 

5.  Let  IP  be  a  2« -dimensional  real  vector  space,  and  let  ( • ,  • )  be  a  nondegenerate 
alternating  bilinear  form  on  IP.  Suppose  that  J  :  W  — >  IP  is  a  linear  map  such 
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that  J 2  =  —I  and  { J{vo\ ),  J{w 2)}  =  {w\,  W2)  for  all  w\  and  W2  in  W.  Prove 
that  W  equals  Vr  for  some  n-dimensional  complex  vector  space  V  possessing  a 
Hermitian  form  whose  imaginary  part  is  ( • ,  • } . 

6.  This  problem  sharpens  the  result  of  Theorem  6.7  in  the  nondegenerate  case.  Let 
( • ,  • }  be  a  nondegenerate  alternating  bilinear  form  on  a  2n  -dimensional  vector 
space  V  over  K.  A  vector  subspace  .S'  of  V  is  called  an  isotropic  subspace  if 
(u,  v)  —  0  for  all  u  and  v  in  S.  Prove  that 

(a)  any  isotropic  subspace  of  V  that  is  maximal  under  inclusion  has  dimension 
n, 

(b)  for  any  maximal  isotropic  subspace  Si,  there  exists  a  second  maximal 
isotropic  subspace  S2  such  that  ,S’i  fl 1  Sj  =  0. 

(c)  if  S\  and  S2  are  maximal  isotropic  subspaces  of  V  such  that  Si  Pi  S2  =  0, 
then  the  linear  map  S2  —*■  VJ  given  by  S2  ( • ,  S2)  |  s  is  an  isomorphism  of 
S2  onto  the  dual  space  Sj . 

(d)  if  Si  and  S2  are  maximal  isotropic  subspaces  of  V  such  that  Si  Pi  S2  =  0, 

then  there  exist  bases  {pi, . . . ,  pn)  of  Si  and  {91 . qn)  of  S2  such  that 

(Pi,  pj )  =  { qt ,  qj)  —  0  and  {pi,  qj)  —  Sjj  for  all  i  and  j.  (The  resulting 
basis  [pi , . . . ,  pn,  qi , . . . ,  qn}  of  V  is  called  a  Weyl  basis  of  V .) 

7.  Let  S  be  a  nonempty  set,  and  let  K  be  a  field.  For  s  in  S ,  let  Us  and  Vs  be  vector 
spaces  over  K,  and  let  U  and  V  be  two  further  vector  spaces  over  K. 

(a)  Prove  that  HomK  ( ®ssS  US,V)  =  n,€S  HomK(C/.s,  V ). 

(b)  Prove  that  HomK  (U,  nss5  Vs)  =  [lvgS  Homz(L,  Vs). 

(c)  Give  examples  to  show  that  neither  isomorphism  in  (a)  and  (b)  need  remain 
valid  if  all  three  direct  products  are  changed  to  direct  sums. 

8.  This  problem  continues  Problem  1  at  the  end  of  Chapter  V,  which  established 
a  canonical-form  theorem  for  an  action  of  GL(m,  K)  x  GL{n ,  K)  on  m-by- 
n  matrices.  For  the  present  problem,  the  group  GL{n,  K)  acts  on  Mn (K)  by 
(g,x)  gxg'. 

(a)  Verify  that  this  is  indeed  a  group  action  and  that  the  vector  subspaces  Ann  (K) 
of  alternating  matrices  and  S,m  (K)  of  symmetric  matrices  are  mapped  into 
themselves  under  the  group  action. 

(b)  Prove  that  two  members  of  Ann  (K)  lie  in  the  same  orbit  if  and  only  if  they 
have  the  same  rank,  and  that  the  rank  must  be  even.  For  each  even  rank  <  n, 
find  an  example  of  a  member  of  A„„  (IK)  with  that  rank. 

(c)  Prove  that  two  members  of  Snn  (C)  lie  in  the  same  orbit  if  and  only  if  they 
have  the  same  rank,  and  for  each  rank  <  n ,  find  an  example  of  a  member  of 
S„„  (C)  with  that  rank. 

9.  Let  U  and  V  be  vector  spaces  over  K,  and  let  U'  be  the  dual  of  U.  The  bilinear 
map  (u' ,  i>)  1— ►  u\-)v  of  U'  x  V  into  PlomK(t/,  V)  extends  to  a  linear  map 
Tuv  '■  U'  ®k  V  — >•  HomK(C/,  V).  Do  the  following: 
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(a)  Prove  that  7)/  y  is  one-one. 

(b)  Prove  that  Tyy  is  onto  HorrrAT/,  V)  if  U  is  finite-dimensional. 

(c)  Give  an  example  for  which  Tyy  is  not  onto  Homr(7/,  V ). 

(d)  Let  C  be  the  category  of  all  vector  spaces  over  K,  and  let  <t>  and  T  be  the 

functors  from  C  x  C  into  C  whose  effects  on  objects  are  <J>  ( U ,  V)  —  U'  V 

and  V(U,  V)  =  Hornet/ ,  V).  Prove  that  the  system  {Tyy}  is  a  natural 
transformation  of  <J>  into  T . 

(e)  In  view  of  (c),  can  the  system  { Ty  y }  be  a  natural  isomorphism? 

10.  Let  K  C  L  be  an  inclusion  of  fields,  and  let  Vk  and  Vl  be  the  categories  of 
vector  spaces  over  K  and  L.  Section  6  of  the  text  defined  extension  of  scalars  as 
a  covariant  functor  <!>(£)  =  E  <g)K  L.  Another  definition  of  extension  of  scalars 
is  'PC E)  —  HomK(L,  E)  with  (lq>)(l')  =  <p(ll').  Verify  that  'I'(F)  is  a  vector 
space  over  L  and  that  'P  is  a  functor. 

11.  A  linear  map  L  :  E  — >  F  between  finite-dimensional  complex  vector  spaces 
becomes  a  linear  map  Lr  :  £r  — »•  Fr  when  we  restrict  attention  to  real  scalars. 
Explain  how  to  express  a  matrix  for  Lr  in  terms  of  a  matrix  for  L. 

12.  (Kronecker  product  of  matrices)  Let  L  :  E\  — >  ZG  and  M  :  E\  —r  be 

linear  maps  between  finite-dimensional  vector  spaces  over  K,  let  T  \  and  r’2  be 

ordered  bases  of  E\  and  E2,  and  let  A]  and  A2  be  ordered  bases  of  F\  and  7*2. 

Define  matrices  A  and  B  by  A  =  ^  ^  ^  and  B  =  ^  Use  T 1 ,  T2,  Aj, 

and  A2  to  define  ordered  bases  £2]  and  £22  of  E\  F\  and  £2  F2,  and 

describe  how  the  matrix  C  =  (  )  is  related  to  A  and  B. 

\  ^2^1  / 

13.  Let  K  be  a  field,  and  let  E  be  the  vector  space  KV ©KL.  Prove  that  the  subalgebra 
of  T ( E )  generated  by  1,  Y,  and  X2  +  XY  +  Y2  is  isomorphic  as  an  algebra  with 
identity  to  T  (F)  for  some  vector  space  F. 

Problems  14-17  concern  the  functors  E  h*  T(E),  E  1— >  S(E),  and  E  f\E 
defined  for  vector  spaces  over  a  field  K. 

14.  If  cp  :  E  — >  F  is  a  linear  map  between  vector  spaces  over  K,  Section  8  of  the  text 
indicated  how  to  define  a  corresponding  homomorphism  <t>  :  S(E)  — »■  S(F)  of 
associative  algebras  with  identity  over  K,  using  Proposition  6.23b. 

(a)  Fill  in  the  details  of  this  application  of  Proposition  6.23b. 

(b)  Establish  the  appropriate  conditions  on  mappings  that  complete  the  proof 
that  E  S(E)  is  a  functor. 

(c)  Verify  that  <t>  carries  Sn(E )  linearly  into  Sn(F)  for  all  integers  n  >  0. 

15.  Suppose  that  a  linear  map  tp  :  E  — >•  E  is  given.  Let  O  :  S(E)  — »•  S(E)  and 

«I>  :  T  ( E )  — >•  T  (E)  be  the  associated  algebra  homomorphisms  of  S(E)  into  itself 
andofF(F)  into  itself,  and  let  q  :  T  ( E )  ,S'(  E)  be  the  quotient  homomorphism 

appearing  in  the  definition  of  S(E).  These  mappings  are  related  by  the  equation 
<l><y(A')  =  q<i>(x)  for  x  in  T (E).  Proposition  6.29  shows  for  each  n  >  0  that 
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Tn(E )  =  S"(E )  ©  ( Tn(E )  fl  /),  where  Sn(E)  is  the  image  of  Tn(E )  under  the 
symmetrizer  mapping.  The  remark  with  the  proposition  observes  that  q  carries 
Sn(E )  one-one  onto  Sn(E).  Prove  that  4>  carries  Sn(E )  into  itself  and  that 
<t>|~  matches  *n  the  sense  that  =  <i>q(x)  for  all  x  in  Sn(E). 

16.  With  E  finite-dimensional  let  tp  :  E  — >  E  be  a  linear  mapping,  and  define 
4>  :  /\E  — »■  /\E  to  be  the  corresponding  algebra  homomorphism  of  /\E 
sending  1  into  1.  This  carries  each  /\nE  into  itself.  Prove  that  <t>  acts  as 
multiplication  by  the  scalar  det  q>  on  the  1 -dimensional  space  /\dlm  E  (£). 

17.  Suppose  that  G  is  a  group,  that  the  vector  space  E  over  K  is  finite-dimensional, 
and  that  tp  :  G  — »■  GL(  E)  is  a  representation  of  G  on  E.  The  functors  £  i-»  T(E), 
E  h*  S(E),  and  E  h*  /\E  yield,  for  each  tp(g),  algebra  homomorphisms  of 
T ( E )  into  itself,  S(E)  into  itself,  and  f\E  into  itself. 

(a)  Show  that  as  g  varies,  the  result  in  each  case  is  a  representation  of  G. 

(b)  Suppose  that  E  =  K".  Give  a  formula  for  the  representation  of  G  on  a 
member  of  P( K")  =  S((Kn)f). 

Problems  18-22  concern  universal  mapping  properties.  Let  A  and  V  be  two  cat¬ 
egories,  and  let  T  :  A  — »•  V  be  a  covariant  functor.  (In  practice,  T  tends  to  be  a 
relatively  simple  functor,  such  as  one  that  simply  ignores  some  of  the  structure  of 
.4.)  Let  E  be  in  Obj(V).  A  pair  (S,  i)  with  S  in  Obj(^4)  and  i  in  Morphy(£,  T(S)) 
is  said  to  have  the  universal  mapping  property  relative  to  E  and  T  if  the  following 
condition  is  satisfied:  whenever  A  is  in  Obj(_4)  and  a  member  /  of  Morph y( E.  T(  A)) 
is  given,  there  exists  a  unique  member  L  of  Morphy  (S,  A)  such  that  T{L)  t  =  l. 

18.  (a)  By  suitably  specializing  A,  V,  T,  etc.,  show  that  the  universal  mapping 

property  of  the  symmetric  algebra  of  a  vector  space  over  K  is  an  instance  of 
what  has  been  described. 

(b)  How  should  the  answer  to  (a)  be  adjusted  so  as  to  account  for  the  universal 
mapping  property  of  the  exterior  algebra  of  a  vector  space  over  K? 

(c)  How  should  the  answer  to  (a)  be  adjusted  so  as  to  account  for  the  universal 
mapping  property  of  the  coproduct  of  {Xj}j€j  in  a  category  C,  the  universal 
mapping  property  being  as  in  Figure  4.12?  (Educational  note:  For  the 
product  of  {Xj  }jej  in  C,  the  above  description  does  not  apply  directly  because 
the  morphisms  go  the  wrong  way.  Instead,  one  applies  the  above  description 
to  the  opposite  categories  ^4opp  and  Vopp,  defined  as  in  Problems  78-80  at 
the  end  of  Chapter  IV.) 

19.  If  (S’,  i)  and  (S' ,  l')  are  two  pairs  that  each  have  the  universal  mapping  property 

relative  to  E  and  T,  prove  that  S  and  S'  are  canonically  isomorphic  as  objects 
in  A.  More  specifically  prove  that  there  exists  a  unique  L  in  Morphy  (.S'.  S')  such 
that  T(L)i  —  t!  and  that  L  is  an  isomorphism  whose  inverse  L ’  in  Morph^jV,  S ) 
has  —  i. 
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20.  Suppose  that  the  pair  ( S ,  l)  has  the  universal  mapping  property  relative  to  E 
and  T.  Let  S  be  the  category  of  sets,  and  define  functors  F  :  A  —>■  S  and 
G  :  A  —y  S  by  F(A)  —  Morphy  (.S'.  A),  F(cp)  equals  composition  on  the  left 
by  q  for  q  e  Morph^fA,  A'),  G(A)  =  Morphy (E,  T(  A)),  and  G(q)  equals 
composition  on  the  left  by  (F(q).  Let  T a  :  Morph^(S,  A)  — >■  Morphy  (£,  (F(A)) 
be  the  one-one  onto  map  given  by  the  universal  mapping  property.  Show  that  the 
system  {7A}  is  a  natural  isomorphism  of  F  into  G. 

2 1 .  Suppose  that  (S',  i)  is  a  second  pair  having  the  universal  mapping  property  relative 
to  E  and  T.  Define  F1  :  A  — »■  S  by  F'(  A)  —  Morphy  (S',  A).  Combining  the 
previous  problem  and  Proposition  6.16,  obtain  a  second  proof  (besides  the  one 
in  Problem  19)  that  S  and  S'  are  canonically  isomorphic. 

22.  Suppose  that  for  each  E  in  Obj(V ),  there  is  some  pair  (S,  i)  with  the  universal 
mapping  property  relative  to  E  and  T.  Fix  such  a  pair  (.S',  i)  for  each  E,  calling 
it  (S(E),  le)-  Making  an  appropriate  construction  for  morphisms  and  carrying 
out  the  appropriate  verifications,  prove  that  E  S(E)  is  a  functor. 


Problems  23-28  introduce  the  Pfaffian  of  a  (2n)-by-(2«)  alternating  matrix  X  =  [xn] 
with  entries  in  a  field  K.  This  is  the  polynomial  in  the  entries  of  X  with  integer 
coefficients  given  by 

n 

Pfaff(X)  =  E  (sgnr)  ]~ [  JCt(2*— 1),t(2*)  , 

somer’s  k=\ 

in  &2„ 


where  the  sum  is  taken  over  those  permutations  r  such  that  r  (2k  —  1 )  <  r  (2k)  for 
1  <  k  <  n  and  such  that  r(l)  <  r(3)  <  •  •  •  <  r(2n  —  1).  It  will  be  seen  that  det  X 
is  the  square  of  this  polynomial.  Examples  of  Pfaffians  are 


Pfaff 


/  0  a 

b 

c\ 

(  °Xn)=* 

and 

Pfaff  |  ~au  ° 

d 

-| 

\-x0j 

1  -b  -d 
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f  / 

\  —c  —e 

-/ 

0/ 

=  af  —  be  +  cd. 


The  problems  in  this  set  will  be  continued  at  the  end  of  Chapter  VIII. 

23.  For  the  matrix  J  in  Section  5,  show  that  Pfaff(/)  =  1. 

24.  In  the  expansion  det  X  —  ^CTSe2  (sgn er)  Y\jli  xl,a(l),  prove  that  the  value  of 
the  right  side  with  X  as  above  is  not  changed  if  the  sum  is  extended  only  over 
those  er’s  whose  expansion  in  terms  of  disjoint  cycles  involves  only  cycles  of 
even  length  (and  in  particular  no  cycles  of  length  1). 

25.  Define  a  e  &2 n  to  be  “good”  if  its  expansion  in  terms  of  disjoint  cycles  involves 
only  cycles  of  even  length.  If  a  is  good,  show  that  there  uniquely  exist  two 
disjoint  subsets  A  and  B  of  n  elements  each  in  {1,  ...  ,2  n}  such  that  A  contains 
the  smallest-numbered  index  in  each  cycle  and  such  that  a  maps  each  set  onto 
the  other. 
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26.  In  the  notation  of  the  previous  problem  with  a  good,  let  y(er)  be  the  product 
of  the  monomials  xab  such  that  a  is  in  A  and  b  —  a  (a).  For  each  factor  Xjj  of 
yip)  with  i  >  j,  replace  the  factor  by  —x /;.  In  the  resulting  product,  arrange 
the  factors  in  order  so  that  their  first  subscripts  are  increasing,  and  denote  this 
expression  by  5x(-1(-2x(-3(-4  •  •  •  Xj2n  lj2n,  where  s  is  a  sign.  Let  r  be  the  permutation 
that  carries  each  r  to  ir,  and  define  s( r)  to  be  the  sign  .v.  Similarly  let  z(er) 
be  the  product  of  the  monomials  Xba  such  that  b  is  in  B  and  a  —  a  (b).  For 
each  factor  x(- j  of  z(cr )  with  i  >  j,  replace  the  factor  by  —Xji.  In  the  resulting 
product,  arrange  the  factors  in  order  so  that  their  first  subscripts  are  increasing, 
and  denote  this  expression  by  s'xj  1j2Xj^j4  ■  ■  ■  Xj2n_]  j2n ,  where  s'  is  a  sign.  Let  x' 
be  the  permutation  that  carries  each  r  to  j, .,  and  define  s'(x')  to  be  the  sign  s’. 
Prove,  apart  from  signs,  that  the  ath  term  in  the  expansion  of  det  X  matches  the 
product  of  the  rth  term  of  Pfaff(X)  and  the  r,th  term  of  Pfaff ( X ) . 

27.  In  the  previous  problem,  take  the  signs  s( r)  and  s'(x')  into  account  and  show 
that  the  signs  of  a,  r,  and  x'  work  out  so  that  the  ath  term  in  the  expansion  of 
detX  is  the  product  of  the  rth  and  xnh  terms  of  Pfaff(X). 

28.  Show  that  every  term  of  the  product  of  Pfaff ( X )  with  itself  is  accounted  for  once 
and  only  once  by  the  construction  in  the  previous  three  problems,  and  conclude 
that  the  alternating  matrix  X  has  det  X  =  (Pfaff(X))2. 

Problems  29-30  concern  filtrations  and  gradings.  A  vector  space  V  over  K  is  said 
to  be  filtered  when  an  increasing  sequence  of  subspaces  Vo  C  V\  C  V2  C  •  •  •  is 
specified  with  union  V.  In  this  case  we  put  V_i  =  0  by  convention.  The  space  V  is 
graded  if  a  sequence  of  subspaces  V°,  V1,  V2, . . .  is  specified  such  that 

OO 

V  =  0  V'\ 

n= 0 

When  V  is  graded,  there  is  a  natural  filtration  of  V  given  by  V„  —  @£_0  Vk.  Examples 
of  graded  vector  spaces  are  any  tensor  algebra  V  —  T  (E),  symmetric  algebra  S{E), 
exterior  algebra  /\(E ),  and  polynomial  algebra  P(E),  the  nth  subspace  of  the  grading 
consisting  of  those  elements  that  are  homogeneous  of  degree  n.  Any  polynomial 
algebra  K[Xi, . . . ,  Xn]  is  another  example  of  a  graded  vector  space,  the  grading 
being  by  total  degree. 

29.  When  V  is  a  filtered  vector  space  as  in  (A.34),  the  associated  graded  vector 
space  is  gr  V  —  ®^L0  V„/V„~ Let  V  and  V *  be  two  filtered  vector  spaces, 
and  let  tp  be  a  linear  map  between  them  such  that  q>(V„ )  C  V*  for  all  n.  Since 
the  restriction  of  tp  to  Vn  carries  Vn-\  into  V*_{,  this  restriction  induces  a  linear 
map  gr"  tp  :  (V;,/V;7_i)  — »■  (V*  /V*^).  The  direct  sum  of  these  linear  maps 
is  then  a  linear  map  gr  tp  :  gr  V  —*■  gr  V#  called  the  associated  graded  map 
for  tp.  Prove  that  if  gr  tp  is  a  vector-space  isomorphism,  then  tp  is  a  vector-space 
isomorphism. 
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30.  Let  A  be  an  associative  algebra  over  K  with  identity.  If  A  has  a  filtration 
Ao,  A],...  of  vector  subspaces  with  1  e  Ao  such  that  AmAn  C  Am+n  for 
all  m  and  n,  then  one  says  that  A  is  a  filtered  associative  algebra;  similarly 
if  A  is  graded  as  A  =  ©^L0  in  such  a  way  that  A'”  A"  C  Am+"  for  all  m 
and  n,  then  one  says  that  A  is  a  graded  associative  algebra.  If  A  is  a  filtered 
associative  algebra  with  identity,  prove  that  the  graded  vector  space  gr  A  acquires 
a  multiplication  in  a  natural  way,  making  it  into  a  graded  associative  algebra  with 
identity. 

Problems  31-35  concern  Lie  algebras  and  their  universal  enveloping  algebras.  If  K 
is  a  field,  a  Lie  algebra  0  over  K  is  a  nonassociative  algebra  whose  product,  called 
the  Lie  bracket  and  written  [x,  y],  is  alternating  as  a  function  of  the  pair  (x ,  y)  and 
satisfies  the  Jacobi  identity  [x,  [y,  z]]  +  [ y ,  [z,  a]]  +  [z,  [x,  y]]  =  0  for  all  x,  y,  z  in 
g.  The  universal  enveloping  algebra  U (0)  of  0  is  the  quotient  T (0)//",  where  I" 
is  the  two-sided  ideal  generated  by  all  elements  x  ®  y  —  y  ®  x  —  [x,  y]  with  x  and 
y  in  T1!©).  The  grading  for  T (0)  makes  U (0)  into  a  filtered  associate  algebra  with 
identity.  The  product  of  x  and  y  in  U  (0)  is  written  xy. 

31.  If  A  is  an  associative  algebra  over  K,  prove  that  A  becomes  a  Lie  algebra  if  the 
Lie  bracket  is  defined  by  [x,  y]  —  xy  —  yx.  In  particular,  observe  that  M„( K) 
becomes  a  Lie  algebra  in  this  way. 

32.  Fix  a  matrix  A  e  M„( K),  and  let  0  be  the  vector  subspace  of  all  members  x  of 
Mn( K)  with  x'  A  T  Ax  =  0. 

(a)  Prove  that  0  is  closed  under  the  bracket  operation  of  the  previous  problem 
and  is  therefore  a  Lie  subalgebra  of  M„(K). 

(b)  Deduce  as  a  special  case  of  (a)  that  the  vector  space  of  all  skew-symmetric 
matrices  in  M„(K)  is  a  Lie  subalgebra  of  M„(K). 

33.  Let  0  be  a  Lie  algebra  over  K,  and  let  1  be  the  linear  map  obtained  as  the 
composition  of  0  -*  7' 1  (0)  and  the  passage  to  the  quotient  1/(0).  Prove  that 
( U  (0),  0  has  the  following  universal  mapping  property:  whenever  /  is  any  linear 
map  of  0  into  an  associative  algebra  A  with  identity  satisfying  the  condition  of 
being  a  Lie  algebra  homomorphism,  namely  l[x,  y]  —  l(x)l(y)  —  Z(y)/(x)  for 
all  x  and  y  in  0,  then  there  exists  a  unique  associative  algebra  homomorphism 
L  :  U(g)  A  with  L(l)  =  1  such  that  L  o  1  —  l. 

34.  Let  0  be  a  Lie  algebra  over  K,  let  {iq  },  s  a  be  a  vector-space  basis  of  0,  and  suppose 
that  a  simple  ordering  has  been  imposed  on  the  index  set  A.  Prove  that  the  set  of 
all  monomials  wZ1  •  ■  ■  ufk  with  ;  1  <  ■  ■  ■  <  4  and  ]P(H  jm  arbitrary  is  a  spanning 
set  for  7/(0). 

35.  For  a  Lie  algebra  0  over  K.  the  Poincare-Birkhoff-Witt  Theorem  says  that  the 
spanning  set  for  U (0)  in  the  previous  problem  is  actually  a  basis.  Assuming  this 
theorem,  prove  that  gr  U (0)  is  isomorphic  as  a  graded  algebra  to  5(0). 

Problems  36-40  introduce  Clifford  algebras.  Let  K  be  a  field  of  characteristic  ^  2, 
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let  £  be  a  finite-dimensional  vector  space  over  K,  and  let  ( • ,  • }  be  a  symmetric 
bilinear  form  on  E.  The  Clifford  algebra  Cliff(£,  ( • ,  • })  is  the  quotient  T ( £)//", 
where  I"  is  the  two-sided  ideal  generated  by  all  elements5  v  ®  v  +  {v,  v)  with  v  in 
E.  The  grading  for  T  ( E )  makes  Cl  i  ff ( E ,  ( • ,  • })  into  a  filtered  associative  algebra 
with  identity.  Products  in  Cliff(£,  ( • ,  •  >)  are  written  as  ab  with  no  special  symbol. 

36.  Let  ;  be  the  composition  of  the  inclusion  E  C  Tl(E)  and  the  passage  to  the 
quotient  modulo  I".  Prove  that  (Cliff(£,  ( • ,  •}),;)  has  the  following  universal 
mapping  property:  whenever  /  is  any  linear  map  of  E  into  an  associative  algebra 
A  with  identity  such  that  l(v)2  =  —{v,  i>)l  for  all  v  e  E,  then  there  exists  a 
unique  algebra  homomorphism  L  :  Cliff(£,  ( ■ ,  • })  — »■  A  with  L(l)  =  1  and 
such  that  L  o  i  =  l. 

37.  Let  {hi,  . . . ,  be  a  basis  of  E.  Prove  that  the  2n  elements  of  Cliff(£,  ( ■ ,  • )) 
given  by  Uj2  ■  ■  ■  Ujk  with  / 1  <  •  •  •  <  4  form  a  spanning  set  of  Cliff(£,  ( • ,  • }). 

38.  Using  the  Principal  Axis  Theorem,  fix  a  basis  {e\, . . .  ,en]  of  E  such  that 
(ej,  ej)  —  djSjj  for  all  j .  Introduce  an  algebra  C  over  K  of  dimension  2"  with 
generators  e\ ,  . . . ,  e„  and  with  a  basis  parametrized  by  subsets  of  { 1 ,...,«}  and 
given  by  all  elements 

•  •  •  elk  with  4  <  4  <  •  •  •  <  4, 
with  the  multiplication  that  is  implicit  in  the  rules 

et  =  —dj  and  e, ej  =  — e;e,-  if;  ^  j, 
namely,  to  multiply  two  monomials  e;ie;2  •  •  •  e,k  and  ej, ej2  ■  ■  ■  ej,,  put  them  end 
to  end,  replace  any  occurrence  of  two  e^’s  by  the  scalar  —d^,  and  then  permute 
the  remaining  eUs  until  their  indices  are  in  increasing  order,  introducing  a  minus 
sign  each  time  two  distinct  s  are  interchanged.  Prove  that  the  algebra  C  is 
associative. 

39.  Prove  that  the  associative  algebra  C  of  the  previous  problem  is  isomorphic  as  an 
algebra  to  Cliff (E,  {•,•}). 

40.  Prove  that  grCliff(£,  ( • ,  • })  is  isomorphic  as  a  graded  algebra  to  /\ (E). 

Problems  41-48  introduce  finite-dimensional  Heisenberg  Lie  algebras  and  the  corre¬ 
sponding  Weyl  algebras.  They  make  use  of  Problems  3 1-35  concerning  Lie  algebras 
and  universal  enveloping  algebras.  Let  V  be  a  finite-dimensional  vector  space  over 
the  field  K,  and  let  ( • ,  • }  be  a  nondegenerate  alternating  bilinear  form  on  V  x  V. 
Write  2 n  for  the  dimension  of  V .  Introduce  an  indeterminate  Xo.  The  Heisenberg 
Lie  algebra  H(V)  on  V  is  a  Lie  algebra  whose  underlying  vector  space  is  Y-Xq  ©  V 
and  whose  Lie  bracket  is  given  by  [(cXo,  h),  (dX o,  u)]  =  ( u ,  v)Xq.  Let  U ( H(V ))  be 
its  universal  enveloping  algebra.  The  Weyl  algebra  W(V)  on  V  is  the  quotient  of  the 
tensor  algebra  T(V)  by  the  two-sided  ideal  generated  by  all  u  ®  v  —  v  ®  u  —  (m,  v)  1 
with  u  and  v  in  V;  as  such,  it  is  a  filtered  associative  algebra. 

5Some  authors  factor  out  the  elements  v  ®  v  —  (v,  v)  instead.  There  is  no  generally  accepted 
convention. 
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41.  Verify  when  the  field  is  K  =  R  that  an  example  of  a  2n -dimensional  V  with  its 
nondegenerate  alternating  bilinear  form  ( • ,  • }  is  V  =  C"  with  (u,  v)  =  lm(n,  v ), 
where  ( • ,  • )  is  the  usual  inner  product  on  C".  For  this  V,  exhibit  a  Lie-algebra 

isomorphism  of  H(V )  with  the  Lie  algebra  of  all  complex  (n  +  l)-by-(n  +  1) 

/Of  ir  \ 

matrices  of  the  form  (  o  o  z  )  with  z  e  C"  and  r  e  R. 

Vo  o  o  / 

42.  In  the  general  situation  show  that  the  linear  map  i(cX  o ,  v)  =  cl+nisa  Lie  algebra 
homomorphism  of  H(V )  into  W  (V )  and  that  its  extension  to  an  associative 
algebra  homomorphismT :  U ( H(V ))  -»  W(  V)  is  onto  and  has  kernel  equal  to 
the  two-sided  ideal  in  U (II (V))  generated  by  Vo  —  1. 

43.  Prove  that  W(V)  has  the  following  universal  mapping  property:  whenever 
<p  :  H(V)  — »■  A  is  a  Lie  algebra  homomorphism  of  H(V)  into  an  associative 
algebra  A  with  identity  such  that  (p  ( Vo)  =  1,  then  there  exists  a  unique  associative 
algebra  homomorphism  <p  of  W(  V)  into  A  such  that  <p  =  <p  o  l. 

44.  Let  Vi , . . . ,  i>2  n  be  any  vector  space  basis  of  V.  Prove  that  the  elements  iq 1  •  •  •  ifa 
with  integer  exponents  >  0  span  W(V). 

45.  For  K  =  R,  let  S  be  the  vector  space  of  all  real-valued  functions  P(x)e~7l'[x^ , 
where  P(x)  is  a  polynomial  in  n  real  variables.  Show  that  S  is  mapped  into  itself 
by  the  linear  operators  d/d.Xj  and  mj  =  (multiplication  by  Xj). 

46.  With  K  =  R,  let  {p\ , . . . ,  pn,  q\, . . . ,  q„}  be  a  Weyl  basis  of  V  in  the  terminology 
of  Problem  6.  In  the  notation  of  Problem  45,  let  cp  :  V  -+  Homp(<S,  S)  be  the 
linear  map  given  by  <p ( p-, )  =  3/3 Xi  and  <p(q,)  —  m  ; .  Use  Problem  43  to  extend 
(p  to  an  algebra  homomorphism  <p  :  W(V)  —>■  HomK(5,  S)  with  ^(1)  =  1, 
and  use  Problem  42  to  obtain  a  representation  of  H(V)  on  S.  Prove  that  this 
representation  of  H(V)  is  irreducible  in  the  sense  that  there  is  no  proper  nonzero 
vector  subspace  carried  to  itself  by  all  members  of  cp(H(V)). 

47.  In  Problem  46  with  K  =  R,  prove  that  the  associative  algebra  homomorphism 
<p  :  W(V)  HomR(6>,  S )  is  one-one.  Conclude  for  K  =  R  that  the  elements 
v\l  ■  ■  ■  V2"  of  Problem  44  form  a  vector-space  basis  of  W  (V). 

48.  For  K  =  R,  prove  that  gr  W (V)  is  isomorphic  as  a  graded  algebra  to  S(V). 

Problems  49-5 1  deal  with  Jordan  algebras.  Let  K  be  a  field  of  characteristic  ^  2.  An 

algebra  J  over  K  with  multiplication  a  ■  b  is  called  a  Jordan  algebra  if  the  identities 

a  -  b  —  b-a  and  a2  ■  (b-a)  =  ( a 2  ■  b)  ■  a  are  always  satisfied;  here  a 2  is  an  abbreviation 

for  a  ■  a. 

49.  Let  A  be  an  associative  algebra,  and  define  a  ■  b  —  \(ab  +  ba).  Prove  that  A 
becomes  a  Jordan  algebra  under  this  new  multiplication. 
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50.  In  the  situation  of  the  previous  problem,  suppose  that «  i — ^  is  a  one-one  linear 

mapping  of  A  onto  itself  such  that  ( ab)1  —  b'a 1  for  all  a  and  b.  (For  example, 
dh4(j'  could  be  the  transpose  mapping  if  A  =  M„( K).)  Prove  that  the  vector 
subspace  of  all  a  with  a1  —  a  is  carried  to  itself  by  the  Jordan  product  a  ■  b  and 
hence  is  a  Jordan  algebra. 

5 1 .  Let  V  be  a  finite-dimensional  vector  space  over  K,  and  let  ( • ,  • )  be  a  symmetric 
bilinear  form  on  V.  Define  A  —  K1  ®  V  as  a  vector  space,  and  define  a 
multiplication  in  A  by  (cl,  x)  •  (dl,  y)  =  ((cd+(x,y))  1,  cy  +  dx).  Prove  that 
A  is  a  Jordan  algebra  under  this  definition  of  multiplication. 

Problems  52-56  deal  with  the  algebra  O  of  real  octonions,  sometimes  known  as 
the  Cayley  numbers.  This  is  a  certain  8-dimensional  nonassociative  algebra  with 
identity  over  R  with  an  inner  product  such  that  \\ab\\  —  ||a||  ||/?||  for  all  a  and  b  and 
such  that  the  left  and  right  multiplications  by  any  element  a  ^  0  are  always  invertible. 

52.  Let  A  be  an  algebra  over  R.  Let  [a,  b J  =  ab  —  ba  and  [a,  b ,  c]  =  ( ab)c  —  a(bc). 

(a)  The  3 -multilinear  function  (a,  b,  c)  i-*  [a,  b,  c]from  Ax  Ax  Ato  Aiscalled 
the  associator  in  A.  Observe  that  it  is  0  if  and  only  if  A  is  associative.  Show 
that  it  is  alternating  if  and  only  if  A  always  satisfies  the  limited  associativity 
laws 

(aa)b  —  a(ab),  ( ab)a  —  a(ba),  ( ba)a  —  b(aa). 

In  this  case,  A  is  said  to  be  alternative. 

(b)  Show  that  A  is  alternative  if  the  first  and  third  of  the  limited  associativity 
laws  in  (a)  are  always  satisfied. 

53.  (Cayley-Dickson  construction)  Suppose  that  A  is  an  algebra  over  R  with  a 
two-sided  identity  1,  and  suppose  that  there  is  an  R  linear  function  *  from  A  to 
itself  (called  “conjugation”)  such  that  1*  =  1,  a**  —  a,  and  (ab)*  —  b*a*  for  all 
a  and  b  in  A.  Define  an  algebra  B  over  R  to  have  the  underlying  real  vector- space 
structure  of  A  ®  A  and  to  have  multiplication  and  conjugation  given  by 

(a,  b)(c,  d)  =  (ac  —  db* ,  a*d  +  cb)  and  (a,  b)*  —  (a*,  —  b). 

(a)  Prove  that  (1, 0)  is  a  two-sided  identity  in  B  and  that  the  operation  *  in  B 
satisfies  the  required  properties  of  a  conjugation. 

(b)  Prove  that  if  a*  =  a  for  all  a  e  A,  then  A  is  commutative. 

(c)  Prove  that  if  a*  =  a  for  all  a  e  A,  then  B  is  commutative. 

(d)  Prove  that  if  A  is  commutative  and  associative,  then  B  is  associative. 

(e)  Verify  the  following  outcomes  of  the  above  construction  A  B  : 

(i)  A  =  R  yields  B  —  C, 

(ii)  A  —  C  yields  B  =  H,  the  algebra  of  quaternions. 
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54.  Suppose  that  A  is  an  algebra  over  R.  with  an  identity  and  a  conjugation  as  in  the 
previous  problem.  Say  that  A  is  nicely  normed  if 

(i)  a  +  a*  is  always  of  the  form  r  1  with  r  real  and 

(ii)  aa*  always  equals  a*  a  and  for  a  ^  0,  is  of  the  form  r  1  with  r  real  and 
positive. 

(a)  Prove  that  if  A  is  nicely  normed,  then  so  is  the  algebra  B  of  the  previous 
problem. 

(b)  Prove  that  if  A  is  nicely  normed,  then  (a,  b)  =  \(ab*  +  bci*)  is  an  inner 
product  on  A  with  norm  ||a||  =  ( aa *)1/2  =  (a*a)1/2. 

(c)  Prove  that  if  A  is  associative  and  nicely  normed,  then  the  algebra  B  of  the 
previous  problem  is  alternative. 

55.  Starting  from  the  real  algebra  A  =  H,  apply  the  construction  of  Problem  53, 
and  let  the  resulting  8-dimensional  real  algebra  be  denoted  by  O,  the  algebra  of 
octonions. 

(a)  Prove  that  O  is  an  alternative  algebra  and  is  nicely  normed. 

(b)  Prove  that  {xx*)y  —  x(x*y )  and  x{yy*)  —  (xy)y*  within  O. 

(c)  Prove  that  \\ab\\2a  —  ||fl||2||b||2a  within  O. 

(d)  Conclude  from  (c)  that  the  operations  of  left  and  right  multiplication  by  any 
a  7^  0  within  O  are  invertible. 

(e)  Show  that  the  inverse  operators  are  left  and  right  multiplication  by  \\a  ||  ~2a*. 

(f)  Denote  the  usual  basis  vectors  of  H  by  1,  i,  j,  k.  Write  down  a  multiplication 
table  for  the  eight  basis  vectors  of  O  given  by  (x,  0)  and  (0,  y)  as  x  and  y 
run  through  the  basis  vectors  of  H. 

56.  What  prevents  the  construction  of  Problem  53,  when  applied  with  A  =  O,  from 
yielding  a  16-dimensional  algebra  B  in  which  \\ab\\ 2  =  ||a||2||£>||2  and  therefore 
in  which  the  operations  of  left  and  right  multiplication  by  any  a  ^  0  within  B 
are  invertible? 


CHAPTER  VII 


Advanced  Group  Theory 


Abstract.  This  chapter  continues  the  development  of  group  theory  begun  in  Chapter  IV,  the  main 
topics  being  the  use  of  generators  and  relations,  representation  theory  for  finite  groups,  and  group 
extensions.  Representation  theory  uses  linear  algebra  and  inner-product  spaces  in  an  essential  way, 
and  a  structure-theory  theorem  for  finite  groups  is  obtained  as  a  consequence.  Group  extensions 
introduce  the  subject  of  cohomology  of  groups. 

Sections  1-3  concern  generators  and  relations.  The  context  for  generators  and  relations  is  that  of 
a  free  group  on  the  set  of  generators,  and  the  relations  indicate  passage  to  a  quotient  of  this  free  group 
by  a  normal  subgroup.  Section  1  constructs  free  groups  in  terms  of  words  built  from  an  alphabet 
and  shows  that  free  groups  are  characterized  by  a  certain  universal  mapping  property.  This  universal 
mapping  property  implies  that  any  group  may  be  defined  by  generators  and  relations.  Computations 
with  free  groups  are  aided  by  the  fact  that  two  reduced  words  yield  the  same  element  of  a  free  group 
if  and  only  if  the  reduced  words  are  identical.  Section  2  obtains  the  Nielsen-Schreier  Theorem  that 
subgroups  of  free  groups  are  free.  Section  3  enlarges  the  construction  of  free  groups  to  the  notion 
of  the  free  product  of  an  arbitrary  set  of  groups.  Free  product  is  what  coproduct  is  for  the  category 
of  groups;  free  groups  themselves  may  be  regarded  as  free  products  of  copies  of  the  integers. 

Sections  4—5  introduce  representation  theory  for  finite  groups  and  give  an  example  of  an  important 
application  whose  statement  lies  outside  representation  theory.  Section  4  contains  various  results 
giving  an  analysis  of  the  space  C  (G,  C)  of  all  complex-valued  functions  on  a  finite  group  G.  In  this 
analysis  those  functions  that  are  constant  on  conjugacy  classes  are  shown  to  be  linear  combinations 
of  the  characters  of  the  irreducible  representations.  Section  5  proves  Burnside’s  Theorem  as  an 
application  of  this  theory— that  any  finite  group  of  order  p“qb  with  p  and  q  prime  and  with  a  +  b  >  1 
has  a  nontrivial  normal  subgroup. 

Section  6  introduces  cohomology  of  groups  in  connection  with  group  extensions.  If  N  is  to  be 
a  normal  subgroup  of  G  and  Q  is  to  be  isomorphic  to  G/N,  the  first  question  is  to  parametrize  the 
possibilities  for  G  up  to  isomorphism.  A  second  question  is  to  parametrize  the  possibilities  for  G  if 
G  is  to  be  a  semidirect  product  of  N  and  Q. 


1.  Free  Groups 

This  section  and  the  next  two  introduce  some  group-theoretic  notions  that  in 
principle  apply  to  all  groups  but  in  practice  are  used  with  countable  groups,  often 
countably  infinite  groups  that  are  nonabelian.  The  material  is  especially  useful  in 
applications  in  topology,  particularly  in  connection  with  fundamental  groups  and 
covering  spaces.  But  the  formal  development  here  will  be  completely  algebraic, 
not  making  use  of  any  definitions  or  theorems  from  topology. 
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In  the  case  of  abelian  groups,  every  abelian  group  G  is  a  quotient  of  a  suitable 
free  abelian  group,  i.e.,  a  suitable  direct  sum  of  copies  of  the  additive  group  Z 
of  integers.1  Recall  the  discussion  of  Section  IV.9:  We  introduce  a  copy  7Lg  of 
Z  for  each  g  in  G,  define  G  =  ®sgG  Zg,  let  ig  :  7Lg  — »■  G  be  the  standard 
embedding,  and  let  <pg  :  7Lg  — >  G  be  the  group  homomorphism  written  additively 
as  (pg(n)  =  ng.  The  universal  mapping  property  of  direct  sums  that  was  stated 
as  Proposition  4.17  produces  a  unique  group  homomorphism  <p  :  G  — »■  G  such 
that  <p  o  ig  =  (pg  for  all  g,  and  tp  is  the  required  homomorphism  of  a  free  abelian 
group  onto  G. 

The  goal  in  this  section  is  to  carry  out  an  analogous  construction  for  groups  that 
are  not  necessarily  abelian.  The  constructed  groups,  to  be  called  "free  groups,” 
are  to  be  rather  concrete,  and  the  family  of  all  of  them  is  to  have  the  property  that 
every  group  is  the  quotient  of  some  member  of  the  family. 

If  S  is  any  set,  we  construct  a  "free  group  F(S)  on  the  set  .S'."  Let  us  speak 
of  S  as  a  set  of  “symbols”  or  as  the  members  of  an  “alphabet,”  possibly  infinite, 
with  which  we  are  working.  If  S  is  empty,  the  group  F  ( S )  is  taken  to  be  the 
one-element  trivial  group,  and  we  shall  therefore  now  assume  that  S  is  not  empty. 
If  a  is  a  symbol  in  S,  we  introduce  a  new  symbol  a-1  corresponding  to  it,  and  we 
let  S~l  denote  the  set  of  all  such  symbols  a~l  for  a  e  S.  Define  S'  =  S  U  S’-1. 
A  word  is  a  finite  string  of  symbols  from  S',  i.e.,  an  ordered  /(-tuple  for  some 
n  of  members  of  S'  with  repetitions  allowed.  Words  that  are  77-tuples  are  said 
to  have  length  n.  The  empty  word,  with  length  0,  will  be  denoted  by  1.  Other 
words  are  usually  written  with  the  symbols  juxtaposed  and  all  commas  omitted, 
as  in  abca~lcb~1 .  The  set  of  words  will  be  denoted  by  W(S’).  We  introduce  a 
multiplication  W (S')  xW(S')  —*■  W(S')  by  writing  end-to-end  the  words  that  are 
to  be  multiplied:  (abca~l ,  cb~l)  i->  abca~xcb~  The  length  of  a  product  is  the 
sum  of  the  lengths  of  the  factors.  It  is  plain  that  this  multiplication  is  associative 
and  that  1  is  a  two-sided  identity.  It  is  not  a  group  operation,  however,  since  most 
elements  of  W(S')  do  not  have  inverses:  multiplication  never  decreases  length, 
and  thus  the  only  way  that  1  can  be  a  product  of  two  elements  is  as  the  product 
1 1 .  To  obtain  a  group  from  W (S'),  we  shall  introduce  an  equivalence  relation  in 
W(S'). 

Two  words  are  said  to  be  equivalent  if  one  of  the  words  can  be  obtained 
from  the  other  by  a  finite  succession  of  insertions  and  deletions  of  expressions 
aa~l  or  a~xa  within  the  word;  here  a  is  assumed  to  be  an  element  of  S.  It  will  be 
convenient  to  refer  to  the  pairs  aa~l  and  a~x  a  together;  therefore  when/?  =  cC 1  is 
in  S~l,  let  us  define  b~l  =  (a-1)-1  tobea.  Then  two  words  are  equivalent  if  one 
of  the  words  can  be  obtained  from  the  other  by  a  finite  succession  of  insertions 
and  deletions  of  expressions  of  the  form  bb~x  with  b  in  S'.  This  definition  is 

1  Direct  sum  here  is  what  coproduct,  in  the  sense  of  Section  IV.  1 1 ,  amounts  to  in  the  category  of 
all  abelian  groups. 
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arranged  so  that  “equivalent”  is  an  equivalence  relation.  We  write  x  ~  y  if  x  and 
y  are  words  that  are  equivalent.  The  underlying  set  for  the  free  group  F(S)  will 
be  taken  to  be  the  set  of  equivalence  classes  of  members  of  W  (S'). 

Theorem  7.1.  If  S  is  a  set  and  W (S')  is  the  corresponding  set  of  words  built 
from  S’  =  S  U  S~l,  then  the  product  operation  defined  on  W (S')  descends  in  a 
well-defined  fashion  to  the  set  F( S)  of  equivalence  classes  of  members  of  W(S'), 
and  F(S)  thereby  becomes  a  group.  Define  l  :  S  F(S)  to  be  the  composition 
of  the  inclusion  into  words  of  length  one  followed  by  passage  to  equivalence 
classes.  Then  the  pair  (F(S),  i)  has  the  following  universal  mapping  property: 
whenever  G  is  a  group  and  <p  :  S  — G  is  a  function,  then  there  exists  a  unique 
group  homomorphism  <p  :  Ft, S')  — >  G  such  that  <p  =  <p  o  i. 

Remark.  The  group  F(S )  is  called  the  free  group  on  S.  Figure  7.1  illustrates 
its  universal  mapping  property.  The  brief  form  in  words  of  the  property  is  that 
any  function  from  S  into  a  group  G  extends  uniquely  to  a  group  homomorphism 
of  F(S )  into  G.  This  universal  mapping  property  actually  characterizes  F(S),  as 
will  be  seen  in  Proposition  7.2. 

S  - >  G 

n 

\r  v 

F(S) 

FIGURE  7.1.  Universal  mapping  property  of  a  free  group. 

PROOF.  Let  us  denote  equivalence  classes  by  brackets.  We  want  to  define 
multiplication  in  F(S)  by  [uq][it>2]  =  [w\w2\.  To  see  that  this  formula  makes 
sense  in  F(S),  let  xi,x2,  and  y  be  words,  and  let  b  be  in  S'.  Define  x  =  x2x2  and 
x'  =  x\bb~lx2,  so  that  x'  ~  x.  Then  it  is  evident  that  x'y  ~  xy  and  yx'  ~  yx. 
Iteration  of  this  kind  of  relationship  shows  that  w\  ~  w  \  and  w'2  ~  w2  implies 
w\w'2  ~  w i  m'2 .  and  hence  multiplication  of  equivalence  classes  is  well  dehned. 

Since  multiplication  in  VF(S")  is  associative,  we  have  [wi]([w2][u|3])  = 
\W\][w2W-i\  =  [Wi(w2Wy)]  =  [(wlw2)u>3]  =  [WiW2][Wy]  =  ([Wi][w2])[W3]- 
Thus  multiplication  is  associative  in  F(S).  The  class  [1]  of  the  empty  word  1  is  a 
two-sided  identity.  lfb\,  ...,/?„  are  in  S',  then  b~l  ■  ■  ■  b2  lb^lb\b2  •  •  •  bn  is  equiv¬ 
alent  to  1,  and  so  is  bib2  ■  ■  ■  bnb~l  ■  ■  ■  b2Xb^x .  Consequently  [b~l  ■  ■  ■  b2]  b^1]  is 
a  two-sided  inverse  of  [b\b2  ■  ■  ■  bn  \ ,  and  F ( S )  is  a  group. 

Now  we  address  the  universal  mapping  property,  first  proving  the  stated  unique¬ 
ness  of  the  homomorphism.  Every  member  of  F(S)  is  the  product  of  classes  \b\ 
with  b  in  S'.  In  turn,  if  b  is  of  the  form  cr 1  with  a  in  S ,  then  \b\  =  [a]-1.  Hence 
F(S )  is  generated  by  all  classes  [a]  with  a  in  S,  i.e.,  by  i(S).  Any  homomorphism 
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of  a  group  is  determined  by  its  values  on  the  members  of  a  generating  set,  and 
uniqueness  therefore  follows  from  the  formula  <p([a])  =  (p(i(a))  =  (p{a). 

For  existence  we  begin  by  defining  a  function  :  W  (S')  >  G  such  that 

c i>(a)  =  (p(a)  fora  in  S', 

<F(a-1)  =  <p{a)~x  fora-1  in  S-1, 

d>(uji  1U2)  =  0 (u^i ) 0 (zx>2)  for  W\  and  w 2  in  W(S'). 

We  use  the  formulas  <t>  ( a )  =  <p(a)  fora  in  S  and<F(a-1)  =  fora-1  in  S~ 1 

as  a  definition  of  <t>  (b)  for  b  in  S'.  Any  member  of  W (S')  can  be  written  uniquely 
as  b\  ■  ■  ■  bn  with  each  bj  in  S',  and  we  set  <F(b\  •  •  •  bn)  =  <S>{b\)  ■  ■  •  Of/?,,).  (If 
n  =  0,  the  understanding  is  that  0(1)  =  1.)  Then  O  has  the  required  properties. 

Let  us  show  that  w'  ~  w  implies  <F(u/)  =  O(w).  If  b  1, . . . ,  b„  are  in  S'  and 
b  is  in  S',  then  the  question  is  whether 

0(Z?i  •  •  •  bkbb~1bk+ 1  •  •  •  b„)  =  <t>(bi  ■  ■  ■  bkbk+ 1  •  •  •  bn). 


If  g  andg'  denote  the  elements  0(^!)  ■  ■  ■  (t>{bk)  and  0(/^+i)  •  •  •  0(£>„)  of  G,  then 
the  two  sides  of  the  queried  formula  are 

g<S>{b)<&{b-x)g  and  gg' . 

Thus  the  question  is  whether  0(Z?)0(Z?-1)  always  equals  1  in  G.  If  b  =  a  is  in  S, 
this  equals  (p{a)(p  (a)-1  =  1,  while  if  Z?  =  a-1  isin,S-1,itequals<p(a)-1<p(a)  =  1. 
We  conclude  that  w'  ~  w  implies  Oliu')  =  OOu). 

We  may  therefore  define  ^([10])  =  O(w)  for[m]  in  F{S).  Since  <p([u>][u/])  = 
( p([ww '])  =  <t>(ww')  =  <t>(w)<&(w')  =  (p{[w])(p{[w')),  <p  is  a  homomorphism 
of  F(S)  into  G.  For  a  in  S,  we  have  (p(\a\)  =  O(a)  =  <p (a ) .  In  other  words, 
<p(t(a))  =  (p(a).  This  completes  the  proof  of  existence.  □ 

Proposition  7.2.  Let  S  be  a  set,  F  be  a  group,  and  i  :  S  F  be  a  func¬ 
tion.  Suppose  that  the  pair  (F,  i  )  has  the  following  universal  mapping  property: 
whenever  G  is  a  group  and  ip  :  S  — G  is  a  function,  then  there  exists  a  unique 
group  homomorphism  <p  :  F  — >■  G  such  that  <p  =  (pot'.  Then  there  exists  a 
unique  group  homomorphism  O  :  F  ( S )  — >■  F  such  that  1'  =  4>  o  1,  and  it  is  a 
group  isomorphism. 

Remarks.  Chapter  VI  is  not  a  prerequisite  for  the  present  chapter.  However, 
readers  who  have  been  through  Chapter  VI  will  recognize  that  Proposition  7.2  is 
a  special  case  of  Problem  19  at  the  end  of  that  chapter. 
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PROOF.  We  apply  the  universal  mapping  property  of  ( F(S ),  t),  as  stated  in 
Theorem  7.1,  to  the  group  G  =  F  and  the  function  cp  =  i' ,  obtaining  a  group 
homomorphism  T  :  F(S)  — »■  F  such  that  i  =  T  o  i.  Then  we  apply  the  given 
universal  mapping  property  of  (F,  i)  to  the  group  G  =  F(S)  and  the  function 
(fi  =  i,  obtaining  a  group  homomorphism  T  :  F  — >  F(S)  such  that  t  =  T  o 

The  group  homomorphism  T  o  <T  :  F(S)  — >  F(S)  has  the  property  that 
(ToT)o<  =  To  (Too)  =  To  i'  =  i,  and  the  identity  1^(5)  has  this  same  property. 
By  the  uniqueness  of  the  group  homomorphism  in  Theorem  7.1,  T  o  T  =  1  F(Sy 

Similarly  the  group  homomorphism  <T  o  T  :  F  — »■  F  has  the  property  that 
(T  o  T )  o  i'  =  t! ,  and  the  identity  1  p  has  this  same  property.  By  the  uniqueness 
of  the  group  homomorphism  in  the  assumed  universal  mapping  property  of  F, 
T  o  T  =  1 F. 

Therefore  T  is  a  group  isomorphism.  We  know  that  i(S)  generates  F(S).  If 
T'  :  F(S)  —>■  F  is  another  group  isomorphism  with  i  =  T'  o  i,  then  T'  and  T 
agree  on  i(S)  and  therefore  have  to  agree  everywhere.  Hence  T  is  unique.  □ 

Proposition  7.2  raises  the  question  of  recognizing  candidates  for  the  set  T  = 
i'(S)  in  a  given  group  F  so  as  to  be  in  a  position  to  exhibit  F  as  isomorphic  to  the 
free  group  F(S).  Certainly  T  has  to  generate  F .  But  there  is  also  an  independence 
condition.  The  idea  is  that  if  we  form  words  from  the  members  of  T,  then  two 
words  are  to  lead  to  equal  members  of  F  only  if  they  can  be  transformed  into  one 
another  by  the  same  rules  that  are  allowed  with  free  groups. 

What  this  problem  amounts  to  in  the  case  that  F  =  F(S )  is  that  we  want  a 
decision  procedure  for  telling  whether  two  given  words  are  equivalent.  This  is 
the  so-called  word  problem  for  the  free  group.  If  we  think  about  the  matter  for  a 
moment,  not  much  is  instantly  obvious.  If  a i  and  a 2  are  two  members  of  S  and  if 
they  are  considered  as  words  of  length  1,  are  they  equivalent?  Equivalence  allows 
for  inserting  pairs  bb~ 1  with  b  in  S',  as  well  as  deleting  them.  Might  it  be  possible 
to  do  some  complicated  iterated  insertion  and  deletion  of  pairs  to  transform  a  1 
into  <72?  Although  the  negative  answer  can  be  readily  justified  in  this  situation  by 
a  parity  argument,  it  can  be  justified  even  more  easily  by  the  universal  mapping 
property:  there  exist  groups  G  with  more  than  one  element;  we  can  map  a\  to 
one  element  of  G  and  <72  to  another  element  of  G,  extend  to  a  homomorphism 
<p  :  F(S)  -»■  G,  see  that  <p(i(«i))  #  ipUfa)),  and  conclude  that  t(«i)  /  tfe)- 
But  what  about  the  corresponding  problem  for  two  more-complicated  words  in  a 
free  group?  Fortunately  there  is  a  decision  procedure  for  the  word  problem  in  a 
free  group.  It  involves  the  notion  of  “reduced”  words.  A  word  in  W (S')  is  said 
to  be  reduced  if  it  contains  no  consecutive  pair  bb~l  with  b  in  S'. 

Proposition  7.3  (solution  of  the  word  problem  for  free  groups).  Let  S  be  a  set, 
let  S'  =  S  U  S-1,  and  let  W(S')  be  the  corresponding  set  of  words.  Then  each 
word  in  W (S')  is  equivalent  to  one  and  only  one  reduced  word. 
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Remark.  To  test  whether  two  words  are  equivalent,  the  proposition  says  to 
delete  pairs  bb~l  with  b  e  S'  as  much  as  possible  from  each  given  word,  and  to 
check  whether  the  resulting  reduced  words  are  identical. 

Proof.  Removal  of  a  pair  bb~ 1  with  b  e  S’  decreases  the  length  of  a  word 
by  2,  and  the  length  has  to  remain  >  0.  Thus  the  process  of  successively  removing 
such  pairs  has  to  stop  after  finitely  many  steps,  and  the  result  is  a  reduced  word. 
This  proves  that  each  equivalence  class  contains  a  reduced  word. 

For  uniqueness  we  shall  associate  to  each  word  a  finite  sequence  of  reduced 
words  such  that  the  last  member  of  the  sequence  is  unchanged  when  we  insert 
or  delete  within  the  given  word  any  expression  bb~l  with  b  e  S'.  Specifically  if 
w  =  b\  ■  ■  ■  bn,  with  each  /;,  in  S',  is  a  given  word,  we  associate  to  w  the  sequence 
of  words  xo,  Xi,  . . . ,  xn  defined  inductively  by 

x0  =  1, 
xi  =  b\, 

[  Xj-\bj  if  i  >2  and  x,_i  does  not  end  in  b~l , 

Xi  =  \  (*) 

[  y,_2  if  i  >2  andx,_!  =  yi-2bi  , 

and  we  define  r(vu)  =  xn.  Let  us  see,  by  induction  on  i  >  0,  that  x,  is  reduced. 
The  base  cases  i  =  0  and  i  =  1  are  clear  from  the  definition.  Suppose  that  i  >  2 
and  that  xo,  -  ■ . ,  x,_i  are  reduced.  If  x,_i  =  yi-2b~x  for  some  y,_2,  then  x,-_ i 
reduced  forces  y,-_ 2  to  be  reduced,  and  hence  x,  =  y,-_ 2  is  reduced.  If  x,_i  does 
not  end  in  bj 1 ,  then  the  last  two  symbols  of  x,  =  x,_  1  /;,■  do  not  cancel,  and  no 
earlier  pair  can  cancel  since  x,-_i  is  assumed  reduced;  hence  x,  is  reduced.  This 
completes  the  induction  and  shows  that  x,-  is  reduced  for  0  <  i  <  n. 

If  the  word  w  =  b\  ■  ■  ■  bn  is  reduced,  then  each  x;  for  i  >  2  is  determined  by 
the  first  of  the  two  choices  in  (*),  and  hence  x;  =b\-  ■  ■  bi  for  all  i.  Consequently 
r(w)  =  w  if  w  is  reduced.  If  we  can  prove  for  a  general  word  b\  ■  ■  ■  bn  that 

r(bi  ■  ■  ■  bn)  =  r(b\  ■  ■  ■  bkbb~xbk+\  ■  ■  ■  bn ),  (**) 

then  it  follows  that  every  word  w'  equivalent  to  a  word  w  has  r  ( w ')  =  r  ( w ) .  Since 
r(w )  =  w  for  w  reduced,  there  can  be  only  one  reduced  word  in  an  equivalence 
class. 

To  prove  (**),  let  xo, . . . ,  xn  be  the  finite  sequence  associated  with  b\  ■  ■  ■  b„, 
and  let  x'0, . . . ,  x'n+2  be  the  sequence  associated  with  b\  ■  ■  ■  bkbb~lbk+  \  ■  ■  ■  bn. 
Certainly  x,  =  x-  for  i  <  k.  Let  us  compute  x'k+l  and  x'k+2.  From  (*)  we  see  that 

[  xkb  if  xk  does  not  end  in  b~l . 

1  y  if  x*  =  yb~l. 
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In  the  first  of  these  cases,  x'k+l  ends  in  h.  and  (*)  says  therefore  that  x'k+2  =  xk. 
In  the  second  of  the  cases,  the  fact  that  xk  is  reduced  implies  that  y  does  not  end 
in  /;;  hence  (*)  says  that  x'k+2  =  yb~x  =  xk .  In  other  words,  x'k+2  =  xk  in  both 
cases.  Since  the  inductive  definition  of  any  x,-  depends  only  on  x,-_i ,  and  similarly 
for  x[,  we  see  that  x'k+2+i  =  Xk+i  for  ()</'<  n  —  k.  Therefore  x'n+2  =  xn,  and 
(**)  follows.  This  proves  the  proposition.  □ 

Let  us  return  to  the  problem  of  recognizing  candidates  for  the  set  T  =  i'(S ) 
in  a  given  group  F  so  that  the  subgroup  generated  by  T  is  a  free  group.  Using 
the  universal  mapping  property  for  the  free  group  F(T),  we  form  the  group 
homomorphism  of  F(T)  into  F  that  extends  the  identity  mapping  on  T .  We  want 
this  homomorphism  to  be  one-one,  i.e.,  to  have  the  property  that  the  only  way  a 
word  in  F  built  from  the  members  of  T  can  equal  the  identity  is  if  it  comes  from 
the  identity.  Because  of  Proposition  7.3  the  only  reduced  word  in  F  (T )  that  yields 
the  identity  is  the  empty  word.  Thus  the  condition  that  the  homomorphism  be 
one-one  is  that  the  only  image  in  F  of  a  reduced  word  in  F(T)  that  can  equal  the 
identity  is  the  image  of  the  empty  word.  Making  this  condition  into  a  definition, 
we  say  that  a  subset  S  =  [gt  \  t  e  T]  of  F  not  containing  1  is  free  if  no  nonempty 
product  h\h.2  ■  ■  ■  hm  in  which  each  hi  or  h]  1  is  in  S  and  each  /i1+i  is  different 
from  li  J 1  can  be  the  identity.  A  free  set  in  F  that  generates  F  is  called  a  free 
basis  for  F . 

Example.  Within  the  free  group  F  ( {x ,  y } )  on  two  generators  x  and  y,  consider 
the  subgroup  generated  by  u  =  x2,  v  =  y2,  and  w  =  xy.  The  claim  is  that 
the  subset  {«,  v,  w }  is  free,  so  that  the  subgroup  generated  by  u,  v,  and  w  is 
isomorphic  to  a  free  group  F({u,  v,  in})  on  three  generators.  We  are  to  check  that 
no  nonempty  reduced  word  in  u,  v.  w .  u~l,  t>-1,  w ~ 1  can  reduce  to  the  empty 
word  after  substitution  in  terms  of  x  and  y.  We  induct  on  the  length  of  the  u,  v.  w 
word,  the  base  case  being  length  0.  Suppose  that  v  =  y2  occurs  somewhere 
in  our  reduced  u,  v.  w  word  that  collapses  to  the  empty  word  after  substitution. 
Consider  what  is  needed  for  the  left-hand  factor  of  y  in  the  y2  to  cancel.  The 
cancellation  must  result  from  the  presence  of  some  y~ 1 .  Suppose  that  this  y~ 1 
occurs  to  the  left  of  y2.  Since  passing  to  a  reduced  word  need  involve  only 
deletions  and  not  insertions  of  pairs,  everything  between  y-1  and  y2  must  cancel. 
If  the  y-1  has  resulted  from  w~x  =  y~lx~l,  then  the  number  of  x,  y  symbols 
between  y-1  and  y2  is  odd,  and  an  odd  number  of  factors  can  never  cancel.  So 
the  y-1  must  arise  from  the  right-hand  y-1  in  a  factor  ir  1  =  y~2.  The  symbols 
between  y~2  and  y2  come  from  some  reduced  u,  v,  w  word,  and  induction  shows 
that  this  word  must  be  trivial.  Then  y~ 2  and  y2  are  adjacent,  contradiction.  Thus 
the  left  factor  of  y2  must  cancel  because  of  some  y_  1  on  the  right  of  y2.  If  the  y~ 1 
is  part  of  w~l  =  y-1x-1  or  is  the  left  y-1  in  v ~ 1  =  y-2,  then  the  number  of  x,  y 
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symbols  between  the  left  y  and  the  y  “ 1  is  odd,  and  we  cannot  get  cancellation.  So 
the  y_1  must  be  the  right-hand  y-1  in  a  factor  y~2.  Then  we  have  an  expression 
y(y  •  •  •  y_1)y_1  in  which  the  symbols  in  parentheses  cancel.  The  symbols  •  •  • 
must  cancel  also;  since  these  represent  some  reduced  u,v,vu  word,  induction 
shows  that  •  •  •  is  empty.  We  conclude  that  y2  and  y~2  are  adjacent,  contradiction. 
Thus  our  reduced  u,  v ,  w  word  contains  no  factor  v.  Similarly  examination  of  the 
right-hand  factor  x  in  an  occurrence  of  x2  shows  that  our  reduced  u,  v,  w  word 
contains  no  factor  u.  It  must  therefore  be  a  product  of  factors  w  or  a  product  of 
factors  w~l.  Substitution  of  w  =  xy  leads  directly  without  any  cancellation  to 
an  x,  y  reduced  word,  and  we  conclude  that  the  u.  v,  w  word  is  empty.  Thus  the 
subset  {u,  v ,  w}  is  free. 

If  G  is  any  group,  the  commutator  subgroup  G'  of  G  is  the  subgroup  generated 
by  all  elements  xyx-1y-1  with  x  e  G  and  y  e  G. 

Proposition  7.4.  If  G  is  a  group,  then  the  commutator  subgroup  is  normal, 
and  G/G'  is  abelian.  If  <p  :  G  — >  H  is  any  homomorphism  of  G  into  an  abelian 
group  H ,  then  kcr  (p  15  G' . 

Proof.  The  computation 

axyx~1y~1a~1  =  (. axa~l)(aya~l)(axa~1)~1(aya~1)~ 1 

shows  that  G'  is  normal.  If  i//  :  G  — >  G/G'  is  the  quotient  homomorphism,  then 
i/GxlVKy)  =  xyG'  =  xy(y_1x_1yx)G,  =  yxG'  =  \ l/(y)\ls(x),  and  therefore 
G/G'  is  abelian.  Finally  if  <p  :  G  — >■  H  is  a  homomorphism  of  G  into  an  abelian 
group  H ,  then  the  computation  <p(xyx_1y_I)  =  (p{x)(p{y)(p{x)~ 1  ^>(y)~ 1  = 
<p(x)(p(x)~  V(y)<p(y)-1  =  1  shows  that  G'  c  kerip.  □ 

Corollary  7.5.  If  F  is  the  free  group  on  a  set  S  and  if  F'  is  the  commutator 
subgroup  of  F,  then  F/F'  is  isomorphic  to  the  free  abelian  group  0S  S  S^s- 

PROOF.  Let  H  =  0vg5  Zv,  and  let  :  S  H  be  the  function  with  tp(s)  =  1 5, 
i.e.,  <p(s)  is  to  be  the  member  of  H  that  is  1  in  the  5th  coordinate  and  is  0  elsewhere. 
Application  of  the  universal  mapping  property  of  F  as  given  in  Theorem  7.1 
yields  a  group  homomorphism  <p  :  F  — »■  H  such  that  <p  o  i  =  (p.  Since  the 
elements  <p(s),  with  5  in  S,  generate  H ,  <p  carries  F  onto  FI.  Since  H  is  abelian. 
Proposition  7.4  shows  that  kerip  D  F' .  Proposition  4.11  shows  that  <p  descends 
to  a  homomorphism  ipo  :  F/F'  — >  H ,  and  <//  has  to  be  onto  H. 

To  complete  the  proof,  we  show  that  <po  is  one-one.  Let  x  be  a  member  of  F. 
Since  the  products  of  the  elements  l(s)  and  their  inverses  generate  F  and  since 
F/F'  is  abelian,  we  can  write  xF'  =  si'  ■  ■  ■  si" F’,  where  s\,  occurs  a  total  of 
ji  times  in  x,  ...  ,  and  Sjn  occurs  a  total  of  j„  times  in  x;  it  is  understood  that 
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an  occurrence  of  1  is  to  contribute  —1  toward  j\ .  Then  we  have  (pu  (x  F' )  = 

ji<p(sh)~\ - \-jn<P(sin)-  lfy0(xF')  =  0,  we  obtain  j\(p (s^ )  4 - b  jn<p(sin)  =  0, 

and  then  j\  =  ■  ■  •  =  jn  =  0  since  the  elements  (pis^). . . . ,  <p(sia )  are  members 
of  a  Z  basis  of  H.  Hence  xF'  =  F',  x  is  in  F',  and  <po  is  one-one.  □ 

Corollary  7.6.  If  F\  and  Fi  are  isomorphic  free  groups  on  sets  .S’ i  and  Sj, 
respectively,  then  Si  and  S2  have  the  same  cardinality. 

PROOF.  Corollary  7.5  shows  that  an  isomorphism  of  F\  with  Fi  induces  an 
isomorphism  of  the  free  abelian  groups  ®ss5l  ZSl  and  ®  5gS  Zv, .  The  rank  of  a 
free  abelian  group  is  a  well-defined  cardinal,  and  the  result  follows— almost. 

We  did  not  completely  prove  this  fact  about  the  rank  of  a  free  abelian  group 
in  Section  IV.9.  Theorem  4.53  did  prove,  however,  that  rank  is  well  defined  for 
finitely  generated  free  abelian  groups.  Thus  the  corollary  follows  if  Si  and  S2  are 
finite.  If  Si  or  S2  is  uncountable,  then  the  cardinality  of  the  corresponding  free 
abelian  group  matches  the  cardinality  of  its  Z  basis;  hence  the  corollary  follows 
if  Sj  or  S2  is  uncountable.  The  only  remaining  case  to  eliminate  is  that  one  of 
Sj  and  S2,  say  the  first  of  them,  has  a  countably  infinite  Z  basis  and  the  other 
has  finite  rank  n.  The  first  of  the  groups  then  has  a  linearly  independent  set  of 
n  +  1  elements,  and  Lemma  4.54  shows  that  the  span  of  these  elements  cannot 
be  isomorphic  to  a  subgroup  of  a  free  abelian  group  of  rank  n.  This  completes 
the  proof  in  all  cases.  □ 

Because  of  Corollary  7.6,  it  is  meaningful  to  speak  of  the  rank  of  a  free  group; 
it  is  the  cardinality  of  any  free  basis.  We  shall  see  in  the  next  section  that  any 
subgroup  of  a  free  group  is  free.  In  contrast  to  the  abelian  case,  however,  the  rank 
may  actually  increase  in  passing  from  a  free  group  to  one  of  its  subgroups:  the 
example  earlier  in  this  section  exhibited  a  free  group  of  rank  3  as  a  subgroup  of 
a  free  group  of  rank  2. 

We  turn  to  a  way  of  describing  general  groups,  particularly  groups  that  are  at 
most  countable.  The  method  uses  “generators,”  which  we  already  understand, 
and  “relations,”  which  are  defined  in  terms  of  free  groups.  Let  S  be  a  set,  let 
R  be  a  subset  of  F(S),  and  let  N(R)  be  the  smallest  normal  subgroup  of  Ft, S') 
containing  R.  The  group  G  =  F (S) / N (R)  is  sometimes  written  as  G  =  (S;  R) 
or  as 

G  =  (elements  of  S\  elements  of  R), 

with  the  elements  of  S  and  R  listed  rather  than  grouped  as  a  set.  Either  of  these 
expressions  is  called  a  presentation  of  G.  The  set  S  is  a  set  of  generators,  and 
the  set  R  is  the  corresponding  set  of  relations.  The  following  result  implicit  in 
the  universal  mapping  property  of  Theorem  7. 1  shows  the  scope  of  this  definition. 
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Proposition  7.7.  Each  group  G  is  the  homomorphic  image  of  a  free  group. 

PROOF.  Let  S  be  a  set  of  generators  for  G;  for  example,  S  can  be  taken  to 
be  G  itself.  Let  <p  :  S  -»  G  be  the  inclusion  of  the  set  of  generators  into  G, 
and  let  <p  :  F(S )  — >  G  be  the  group  homomorphism  of  Theorem  7.1  such  that 
(p ( i  (,v ) )  =  (p{s)  for  all  s  in  S.  The  image  of  <p  is  a  subgroup  of  G  that  contains 
the  generating  set  S  and  is  therefore  equal  to  all  of  G.  Thus  <p  is  the  required 
homomorphism.  □ 

If  G  is  any  group  and  <p  :  F(S)  — »  G  is  the  homomorphism  given  in  Propo¬ 
sition  7.7,  then  the  subgroup  R  =  ker (p  has  the  property  that  G  =  ( S ;  R). 
Consequently  every  group  can  be  given  by  generators  and  relations. 

Lor  example  the  proof  of  the  proposition  shows  that  one  possibility  is  to  take 
S  =  G  and  R  equal  to  the  set  of  all  members  of  the  multiplication  table,  but  with 
the  multiplication  table  entry  ss'  =  s"  rewritten  as  the  left  side  vsTs")  - 1  of  an 
equation  .v v '  (.v " ) —  1  =  1  specifying  a  combination  of  generators  that  maps  to  1. 
This  is  of  course  not  a  very  practical  example.  Generators  and  relations  are  most 
useful  when  S  and  R  are  fairly  small.  One  says  that  G  is  finitely  generated  if  S  can 
be  chosen  to  be  finite,  finitely  presented  if  both  S  and  R  can  be  chosen  to  be 
finite. 


A  frequently  used  device  in  working  with  generators  and  relations  is  the 
following  simple  proposition. 

Proposition  7.8.  LetG  =  (.S';  R)  be  a  group  given  by  generators  and  relations, 
let  G'  be  a  second  group,  let  <p  be  a  one-one  function  <p  from  S  onto  a  set  of 
generators  for  G' ,  and  let  :  F(S)  — >  G'  be  the  extension  of  (p  to  a  group 
homomorphism.  If  4>(r)  =  1  for  every  member  r  of  R,  then  <t>  descends  to  a 
homomorphism  of  G  onto  G' .  In  particular,  if  G  =  (S;  R)  and  G'  =  (5;  R') 
are  groups  given  by  generators  and  relations  with  R  C  R' .  then  the  natural 
homomorphism  of  F(S)  onto  G'  descends  to  a  homomorphism  of  G  onto  G' . 

PROOF.  The  proposition  follows  immediately  from  the  universal  mapping 
property  in  Theorem  7. 1  in  combination  with  Proposition  4.11.  □ 

Now  let  us  consider  some  examples  of  groups  given  by  generators  and  relations. 
The  case  of  one  generator  is  something  we  already  understand:  the  group  has  to 
be  cyclic.  A  presentation  of  Z  is  as  {a;  ),  and  a  presentation  of  C„  is  as  {a;  a"). 
But  other  presentations  are  possible  with  one  generator,  such  as  (a;  a6,  a9)  for 
C 3.  Here  is  an  example  with  two  generators. 
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Example.  Let  us  prove  that  Dn  =  [x,y\  xn,  y2,  (xy)2),  where  Dn  is  the 
dihedral  group  of  order  2 n.  Concretely  let  us  work  with  Dn  as  the  group  of  2-by-2 

real  matrices  generated  by  "cTs 2*/”)  and  (0  _i)-  The  generated  group 

indeed  has  order  In.  If  we  identify 

.  ,  /  cos2tt/«  — sin2^-/n\  ,  .  ,  /  1  0\ 

x  with  (  .  '  '  J  and  y  Wlth  A  ,  , 

y  sin27r/n  cos2jT/n  J  y  yo  —  1  / 

then  y2  =  1 ,  and  the  formula 

(cos2? x/n  —sm2n/n\k  /  cos2  nk/n  — sin2  nk/n\ 

s\n2n/n  cos27r/«  J  y  sin2nk/n  cos2jrk/n  J 

shows  that  x"  =  1.  In  addition,  xy  =  ( sm2TI/n  \  an(j  t]lc  SqUare  0f 

y  y  sm2n/n  —cos>2n/n  J  n 

this  is  the  identity.  By  Proposition  7.8,  Dn  is  a  homomorphic  image  of  Dn  = 
(jc,  y\  x" ,  y2,  (xy)2).  To  complete  the  identihcation,  it  is  enough  to  show  that  the 
order  of  Dn  is  <  2 n  because  the  homomorphism  of  Dn  onto  Dn  must  then  be 
one-one.  In  (jc,  y;  xn,  y2.  (xy)2),  we  compute  that  y_1  =  y  and  that  x(yx)y  =  1 
implies  yx  =  x-1y_i  =  x-1_y.  Induction  then  yields  yxk  =  x~ky  for  k  >  0. 
Multiplying  left  and  right  by  y  gives  yx~k  =  xky  for  k  >  0.  So  yx1  =  x~'y  for 
every  integer  I.  This  means  that  every  element  is  of  the  form  xm  or  xmy,  and  we 
may  take  0  <  m  <  n  —  1.  Hence  there  are  at  most  2n  elements. 

Without  trying  to  be  too  precise,  let  us  mention  that  the  word  problem  for 
finitely  presented  groups  is  to  give  an  algorithm  for  deciding  whether  two  words 
represent  the  same  element  of  the  group.  It  is  known  that  there  is  no  such 
algorithm  applicable  to  all  finitely  presented  groups.  Of  course,  there  can  be 
such  an  algorithm  for  certain  special  classes  of  presentations.  For  example,  if 
there  are  no  relations  in  the  presentation,  then  the  group  is  a  free  group,  and 
Proposition  7.3  gives  a  solution  in  this  case.  There  tends  to  be  a  solution  for  a 
class  of  groups  if  the  groups  all  correspond  rather  concretely  to  some  geometric 
situation,  such  as  a  tiling  of  Euclidean  space  or  some  other  space.  The  example 
above  with  Dn  is  of  this  kind. 

By  way  of  a  concrete  class  of  examples,  one  can  identify  any  doubly  generated 
group  of  the  form  [x,  y;  xa,  yb,  (xy)c)  if  a,  b,  c  are  integers  >  1,  and  one  can 
describe  what  words  represent  what  elements  in  these  groups.  These  groups  all 
correspond  to  tilings  in  2  dimensions.  Infact,lety  =  a~l  +b~l  +c-1.  If  y  >  1, 
the  tiling  is  of  the  Riemann  sphere,  and  the  group  is  finite.  If  y  =  1,  the  tiling  is 
of  the  Euclidean  plane  K2,  and  the  group  is  infinite.  If  y  <  1,  the  tiling  is  of  the 
hyperbolic  plane,  and  the  group  is  infinite.  In  all  cases  one  starts  from  a  triangle  in 
the  appropriate  geometry  with  angles  iv/a,  n/b ,  and  7 r/c,  and  a  basic  tile  consists 
of  the  double  of  this  triangle  obtained  by  reflecting  the  triangle  about  any  of  its 
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sides.  The  group  elements  x,  y,  and  xy  are  rotations,  suitably  oriented,  about  the 
vertices  of  the  triangle  through  respective  angles  2tt /a ,  2n/b,  and  27t/c.  Further 
information  about  the  cases  y  >  1  and  y  =  1  is  obtained  in  Problems  37-46  at 
the  end  of  the  chapter. 

We  conclude  with  one  further  example  of  a  presentation  whose  group  we  can 
readily  identify  concretely. 

Proposition  7.9.  Let  S  be  a  set,  and  let  R  =  \  s  e  S,  t  e  S}.  Then 

the  smallest  normal  subgroup  of  the  free  group  F(S)  containing  R  is  the  com¬ 
mutator  subgroup  F(S)',  and  therefore  (S;  R)  is  isomorphic  to  the  free  abelian 
group  ©,s5Zs. 

PROOF.  The  members  of  R  are  in  F(S)',  the  product  of  two  members  of  F(S)' 
is  in  F(SY,  and  any  conjugate  of  a  member  of  F(S)'  is  in  F(S)'.  Therefore 
the  smallest  normal  subgroup  N(R)  containing  R  has  N(R)  C  F(S)' .  Let 
<p  :  F(S)  — >  F(S)/ N( R)  be  the  quotient  homomorphism.  Elements  of  the 
quotient  F(S)/N(R)  may  be  expressed  as  words  in  the  elements  (p(s)  and  (p(s)~ 1 
for  j  in  S,  and  the  factors  commute  because  of  the  definition  of  R.  Therefore 
F (S') / N (R)  is  abelian.  By  Proposition  7.4,  N(R)  2>  F(S)'.  Therefore  N{R)  = 
F(S)'.  This  proves  the  first  conclusion,  and  the  second  conclusion  follows  from 
Corollary  7.5.  □ 


2.  Subgroups  of  Free  Groups 

The  main  result  of  this  section  is  that  any  subgroup  of  a  free  group  is  a  free  group. 
An  example  in  the  previous  section  shows  that  the  rank  can  actually  increase  in 
the  process  of  passing  to  the  subgroup. 

The  proof  of  the  main  result  is  ostensibly  subtle  but  is  relatively  easy  to  under¬ 
stand  in  topological  terms.  Although  we  shall  give  the  topological  interpretation, 
we  shall  not  pursue  it  further,  and  the  proof  that  we  give  may  be  regarded  as  a 
translation  of  the  topological  proof  into  the  language  of  algebra,  combined  with 
some  steps  of  beautification. 

For  purposes  of  the  topological  argument,  let  us  think  of  the  given  free  group 
for  the  moment  as  finitely  generated,  and  let  us  suppose  that  the  subgroup  has 
finite  index.  A  free  group  on  n  symbols  is  the  fundamental  group  of  a  bouquet 
of  n  circles,  all  joined  at  a  single  point,  which  we  take  as  the  base  point.  By  the 
theory  of  covering  spaces,  any  subgroup  of  index  k  is  the  fundamental  group  of 
some  A'-shected  covering  space  of  the  bouquet  of  circles.  This  covering  space  is 
a  1 -dimensional  simplicial  complex,  and  one  can  prove  with  standard  tools  that 
the  fundamental  group  of  any  1 -dimensional  simplicial  complex  is  a  free  group. 
The  theorem  follows. 
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If  the  special  hypotheses  are  dropped  that  the  given  free  group  is  finitely 
generated  and  the  subgroup  has  finite  index,  then  the  same  proof  is  applicable  as 
long  as  one  allows  a  suitable  generalization  of  the  notion  of  simplicial  complex. 
Thus  the  topological  argument  is  completely  general. 

The  theorem  then  is  as  follows. 

Theorem  7.10  (Nielsen-Schreier  Theorem).  Every  subgroup  of  a  free  group 
is  a  free  group. 

Remarks.  The  algebraic  proof  will  occupy  the  remainder  of  the  section  but 
will  occasionally  be  interrupted  by  comments  about  the  example  in  the  previous 
section. 

Let  the  given  free  group  be  F,  let  the  subgroup  be  H,  and  form  the  right  cosets 
Hg  in  F.  Let  C  be  a  set  of  representatives  for  these  cosets,  with  1  chosen  as 
the  representative  of  the  identity  coset;  we  shall  impose  further  conditions  on  C 
shortly. 

Example,  continued.  For  the  example  in  the  previous  section,  we  were 
given  a  free  group  F  with  two  generators  x,y,  and  the  subgroup  H  is  taken  to 
have  generators  x2,  xy,  y2.  In  fact,  one  readily  checks  that  H  is  the  subgroup 
formed  from  all  words  of  even  length,  and  we  shall  think  of  it  that  way.  The  set 
C  of  coset  representatives  may  be  taken  to  be  {l,x}  in  this  case.  The  argument 
we  gave  that  H  is  free  has  points  of  contact  with  the  proof  we  give  of  Theorem 
7.10  but  is  not  an  exact  special  case  of  it.  One  point  of  contact  is  that  within 
each  generator  of  H  that  we  identify,  there  is  some  particular  factor  that  does 
not  cancel  when  that  generator  appears  in  a  word  representing  a  member  of  the 
subgroup. 

We  define  a  function  p  :  F  — »■  C  by  taking  p(x)  to  be  the  coset  representative 
of  the  member  x  of  F .  This  function  has  the  property  that  p(hx)  =  p(x)  for  all 
h  in  H  and  x  in  F.  Also,  x  i— >  xp(x)-1  is  a  function  from  F  to  //,  and  it  is  the 
identity  function  on  H.  The  first  lemma  shows  that  a  relatively  small  subset  of 
the  elements  xp(x)-1  is  a  set  of  generators  of  H . 

Lemma  7.11.  Let  S  be  the  set  of  generators  of  F ,  and  let  S'  =  S  U  S~ 1 . 
Every  element  of  H  is  a  product  of  elements  of  the  form  gbp{gb)~x  with  g  in 
C  and  b  in  S'.  Furthermore  the  element  g'  =  p{gb)  of  C  has  the  properties 
that  g  =  p(g'b~l)  and  that  gb~l p{gb~l)~x  is  of  the  form  ( g'bp(g'b)~l )  ' . 
Consequently  the  elements  gap(ga)~l  with  g  in  C  and  a  in  S  form  a  set  of 
generators  of  H. 
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Example,  continued.  In  the  example,  we  are  taking  C  =  {1,  a}  and  S  = 
{x,  y}.  The  elements  gbp(gb)~x  obtained  with  g=l  and/?  equal  to  x,  y,  x~x ,  y~x 
are  1,  yx~l,  x~xx~x,  and  y_1x_1.  The  elements  gbp(gb)~x  obtained  with  g  =x 
and  b  equal  to  x,  y,  x_1,  y^1  are  xx,  xy,  1,  and  xy-1.  The  lemma  says  that  1, 
yx~x,  xx,  and  xy  form  a  set  of  generators  of  H  and  that  the  elements  x-1x_1, 
y_1x_1,  1,  and  xy-1  are  inverses  of  these  generators  in  some  order. 

Remark.  The  lemma  needs  no  hypothesis  that  F  is  free.  A  nontrivial  ap¬ 
plication  of  the  lemma  with  F  not  free  appears  in  Problem  43  at  the  end  of  the 
chapter. 

PROOF.  Any  h  in  F  can  be  written  as  a  product  h  =  b\  ■  ■  ■  bn  with  each  bj  in 
S'.  Define  ro  =  1  and  r k  =  p(b \  ■  ■  ■  bk)  for  1  <  k  <  n.  Then 

hr~l  =  (rQbirp)(rib2r2l)  ■  ■  ■  {rn-ibnr~l).  (*) 


Since 

rk  =  P(bi  •  •  •  bk)  =  p(b\  ■  ■  ■  bk-ibk)  =  p(p(b\  ■  ■  ■  bk_x)bk)  =  p(rk_xbk), 

we  have  rk-\bkr^x  =  gbp(gb)~x  with  g  =  rk-\  and  b  =  bk.  Thus  (*)  exhibits 
hr~l  as  a  product  of  elements  as  in  the  first  conclusion  of  the  lemma.  Since 
rn  =  p{b\  ■  ■  ■  bn)  =  p(h),  rn  =  1  if  h  is  in  H.  Therefore  in  this  case,  h  itself  is 
a  product  of  elements  as  in  the  statement  of  that  conclusion,  and  that  conclusion 
is  now  proved. 

For  the  other  conclusion,  let  gb~x  p{gb~x)~x  be  given,  and  put#'  =  p(gb~l), 
so  that  gb~xg'~l  =  h  is  in  H.  This  equation  implies  that  g'b  =  h~lg.  Hence 
p(g'b)  =  p(h~xg )  =  p{g)  =  g,  and  it  follows  that  gb~x  p(gb~l)~l  =  gb~x g'~l 
=  (g'bg~x)~x  =  (g'bp(g'b)-1)  ' .  This  proves  the  lemma.  □ 

Lemma  7.12.  With  F  free  it  is  possible  to  choose  the  set  C  of  coset  represen¬ 
tatives  in  such  a  way  that  all  of  its  members  have  expansions  in  terms  of  S'  as 
g  =  /?]••  •  bn  in  which 

(a)  g  =  b\b2  •  ■  ■  b„  is  a  reduced  word  as  written, 

(b)  b\b2  ■  ■  ■  bn—\  is  also  a  member  of  C. 

Remarks.  It  is  understood  from  the  case  of  n  =  1  in  (b)  that  1  is  the 
representative  of  the  identity  coset.  When  C  is  chosen  as  in  this  lemma,  C  is 
said  to  be  a  Schreier  set.  In  the  example,  C  =  {1,  x}  is  a  Schreier  set.  So  is 
C  =  { 1,  y},  and  hence  the  selection  of  a  Schreier  set  may  involve  a  choice. 

PROOF.  If  S'  is  finite  or  countably  infinite,  we  enumerate  it.  In  the  uncountable 
case  (which  is  of  less  practical  interest),  we  introduce  a  well  ordering  in  S'  by 
means  of  Zermelo’s  Well-Ordering  Theorem  as  in  Section  A5  of  the  appendix. 
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The  ordering  of  S'  will  be  used  to  define  a  lexicographic  ordering  of  the  set  of 
all  reduced  words  in  the  members  of  S'.  If 

x  =  b\---  bm  and  y  =  b\  ■  ■  ■  b'n  (*) 

are  reduced  words  with  m  <  n,  we  say  that  x  <  y  if  any  of  the  following  hold: 

(i)  m  <  n, 

(ii)  m  =  n  and  b\  <  b\ , 

(iii)  m  =  n.  and  for  some  k  <  m,  b\  =  b[, .. .  ,  bk  =  b'k,  and  bk+x  <  b'k+ v 
With  this  dehnition  the  set  of  reduced  words  is  well  ordered,  and  hence  any 
nonempty  subset  of  reduced  words  has  a  least  element. 

Let  us  observe  that  if  x,  y,  z  are  reduced  words  with  x  <  y  and  if  yz  is  reduced 
as  written,  then  xz  <  yz  after  xz  has  been  reduced.  In  fact,  let  us  assume  that  x 
and  y  are  as  in  (*)  and  that  the  length  of  z  is  r.  The  assumption  is  that  yz  has 
length  n  +  r,  and  the  length  of  xz  is  at  most  m  +  r.  If  m  <  n,  then  certainly 
xz  <  yz.  If  m  =  n  and  xz  fails  to  be  reduced,  then  the  length  of  xz  is  less  than  the 
length  of  yz,  and  xz  <  yz.  If  m  =  n  and  xz  is  reduced,  then  the  first  inequality 
bk  <  b'k  with  x  and  y  shows  that  xz  <  yz. 

To  define  the  set  C  of  coset  representatives,  let  the  representative  of  Hg  be 
the  least  member  of  the  set  Hg,  each  element  being  written  as  a  reduced  word. 
Since  the  length  of  the  empty  word  is  0,  the  representative  of  the  identity  coset 
H  is  1  under  this  definition.  Thus  all  we  have  to  check  is  that  an  initial  segment 
of  a  member  of  C  is  again  in  C. 

Suppose  thath]  •  •  •  bn  is  in  C,  so  that  b\  ■  ■  ■ bn  is  the  least  element  of  Hb\  ■  ■  ■ bn . 
Denote  the  least  element  of  Hb\  ■  ■  ■  bn-\  by  g.  If  g  =  b\  ■  ■  ■  b„_i,  we  are  done. 
Otherwise  g  <  b\  ■  ■  ■  b„_i,  and  then  the  fact  that  b\  ■  ■  ■  bn  is  reduced  implies 
that  gbn  <  b\  ■  ■  ■  bn.  But  gbn  is  in  Hb\  ■  ■  ■  bn,  and  this  inequality  contradicts 
the  minimality  of  b\  -  ■  ■  bn  in  that  coset.  Thus  we  conclude  that  g  =  b\  ■  ■  ■  bn-\. 
This  proves  the  lemma.  □ 

For  the  remainder  of  the  proof  of  Theorem  7.10,  we  assume,  as  we  may  by 
Lemma  7.12,  that  the  set  C  of  coset  representatives  is  a  Schreier  set.  Typical 
elements  of  S  will  be  denoted  by  a,  and  typical  elements  of  S'  =  S  U  S-1  will  be 
denoted  by  b.  Let  us  write  u  for  a  typical  element  gap(ga)~ 1  with  g  in  C,  and  let 
us  write  v  for  a  typical  element  gbp{gb)~l  with  g  in  C.  The  elements  it  generate 
H  by  Lemma  7.11,  and  each  element  v  is  either  an  element  u  or  the  inverse  of  an 
element  u,  according  to  the  lemma.  We  shall  prove  that  the  elements  u  not  equal 
to  1  are  distinct  and  form  a  free  basis  of  H. 

First  we  prove  that  each  of  the  elements  v  =  gbp{gb)~ 1  either  is  reduced  as 
written  or  is  equal  to  1.  Put  g'  =  p{gb),  so  that  v  =  gbg ,_1.  Since  g  and  g'  are  in 
the  Schreier  set  C,  they  are  reduced  as  written,  and  hence  so  are  g  and  g'~ 1 .  Thus 
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the  only  possible  cancellation  in  u  occurs  because  the  last  factor  of  g  is  b~ 1  or  the 
last  factor  of  g'  is  b.  If  the  last  factor  of  g  is  b~ 1 ,  then  gb  is  an  initial  segment  of 
g  and  hence  is  in  the  Schreier  set  C;  thus  p(gb)  =  gb  and  v  =  gbp(gb)~x  =  1. 
Similarly  if  the  last  factor  of  g'  is  /;,  then  g'b~x  is  an  initial  segment  of  g'  and 
hence  is  in  the  Schreier  set  C;  thus  p{g'b~x)  =  g'b~l,  and  Lemma  7.11  gives 
u_1  =  ( gbp(gb)~x )  1  =  g'b~l p(g'b~l)~]  =  1.  Thus  v  =  gbp(gb)~x  either  is 
reduced  as  written  or  is  equal  to  1 . 

Next  let  us  see  that  the  elements  u  other  than  1  are  distinct.  Suppose  that 
v  =  gbp(gb)~x  =  g'b'p(g'b')-1  is  different  from  1.  Remembering  that  each  of 
these  expressions  is  reduced  as  written,  we  see  that  if  g  is  shorter  than  g',  then  gb 
is  an  initial  segment  of  g'.  Since  C  is  a  Schreier  set,  gb  is  in  C  and  p{gb)  =  gb\ 
thus  v  =  gbp(gb)~x  equals  1,  contradiction.  Similarly  g'  cannot  be  shorter  than 
g.  So  g  and  g'  must  have  the  same  length  l.  In  this  case  the  first  l  +  1  factors 
must  match  in  the  two  equal  reduced  words,  and  we  conclude  that  g  =  g'  and 
b  =  b'.  This  proves  the  uniqueness. 

We  know  that  each  v  is  either  some  u  or  some  u'~l,  and  this  uniqueness  shows 
that  it  cannot  be  both  unless  v  =  1 .  Therefore  the  nontrivial  u  ’s  are  distinct,  and 
the  nontrivial  u’s  consist  of  the  u’s  and  their  inverses,  each  appearing  once. 

Since  an  element  v  not  equal  to  1  therefore  determines  its  g  and  b,  let  us  refer 
to  the  factor  b  of  v  =  gbp(gb)~l  as  the  significant  factor  of  u.  This  is  the  part 
that  will  not  cancel  out  when  we  pass  from  a  product  of  u’s  to  its  reduced  form. 

Specifically  suppose  that  we  have  v  =  gbp{gb)~x  and  v  =  gbp{gb)~x ,  that 
neither  of  these  is  1,  and  that  v  i=-  u_1.  Put  g'  =  p(gb)  and  g'  =  p{gb).  The 
claim  is  that  the  cancellation  in  forming  vv  =  gbg’~ 1  gbg'~ 1  does  not  extend 
to  either  of  the  significant  factors  b  and  b.  If  it  does,  then  one  of  three  things 
happens: 

(i)  the  b  in  bg'~x  gets  canceled  because  the  last  factor  of  g'  is  /;,  in  which 
case  g'b~x  is  an  initial  segment  of  g' ,  g'b~x  =  p(g'b~{)  =  g,  and 
v  =  gbg'~l  =  1,  or 

(ii)  the  b  in  gb  gets  canceled  because  the  last  factor  of  g  is  b -1 ,  in  which  case 
gb  is  an  initial  segment  of  g,  gb  =  pigb)  =  g' ,  and  v  =  gbg'~l  =  1,  or 

(iii)  g'~lg  =  1  and  bb  =  1,  in  which  case  g  =  g',b  =  b~l ,  and  the  middle 
conclusion  of  Lemma  7.11  allows  us  to  conclude  that  u  =  v-1 . 

All  three  of  these  possibilities  have  been  ruled  out  by  our  assumptions,  and 
therefore  neither  of  the  significant  factors  in  vv  cancels. 

As  a  consequence  of  this  noncancellation,  we  can  see  that  in  any  product 
Vi  ■  ■  ■  vm  of  u’s  in  which  no  is  1  and  no  v^+i  equals  v'k  1 ,  none  of  the  significant 
factors  cancel.  In  fact,  the  previous  paragraph  shows  that  the  significant  factors 
of  ui  and  U2  survive  in  forming  U1U2,  the  significant  factors  of  m  and  U3  survive 
in  right  multiplying  by  U3,  and  so  on.  Since  the  nontrivial  u' s  are  distinct  and 
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the  nontrivial  it’s  consist  of  the  w’s  and  their  inverses,  each  appearing  once,  we 
conclude  that  the  set  of  nontrivial  u’s  is  a  free  subset  of  F.  Lemma  7.11  says  that 
the  u’s  generate  H,  and  therefore  the  set  of  nontrivial  w’s  is  a  free  basis  of  H. 


3.  Free  Products 

The  free  abelian  group  on  an  index  set  S,  as  constructed  in  Section  IV.9,  has  a 
universal  mapping  property  that  allows  arbitrary  functions  from  S  into  any  target 
abelian  group  to  be  extended  to  homomorphisms  of  the  free  abelian  group  into 
the  target  group.  The  construction  of  free  groups  in  Section  1  was  arranged  to 
adapt  the  construction  so  that  the  target  group  in  the  universal  mapping  property 
could  be  any  group,  abelian  or  nonabelian. 

In  this  section  we  make  a  similar  adaptation  of  the  construction  of  a  direct  sum 
of  abelian  groups  so  that  the  result  is  applicable  in  a  context  of  arbitrary  groups. 
Proposition  4.17  gave  the  universal  mapping  property  of  the  external  direct  sum 
®  5  Gs  of  a  set  of  abelian  groups  with  associated  embedding  homomorphisms 
is0  '■  Gs0  ~ ^  0ses  Gs.  The  statement  is  that  if  H  is  any  abelian  group  and 
{(ps  \  s  e  .S')  is  a  system  of  group  homomorphisms  <ps  :  Gs  — »■  H ,  then  there 
exists  a  unique  group  homomorphism  <p  :  ®  ssS  Gs  —>  H  such  that  <p  o  iSo  =  <pS(1 
for  all  so  £  S.  Example  2  of  coproducts  in  Section  IV.  1 1  shows  that  direct  sum 
is  therefore  the  coproduct  functor  in  the  category  of  all  abelian  groups. 

This  universal  mapping  property  of  ®ss5  Gs  fails  when  H  is  a  nonabelian 
group  such  as  the  symmetric  group  S3.  In  fact,  S3  has  an  element  of  order  2  and 
an  element  of  order  3  and  hence  admits  nontrivial  homomorphisms  <^2  :  C2  — >  S  3 
and  <p3  :  C 3  ^  S3.  But  there  is  no  homomorphism  cp  :  C2  ©  C3  — »■  S3  such 
that  cpoi2  =  (fii  and  <p  o  r3  =  <p3  because  the  image  of  <p  has  to  be  abelian  but  the 
images  of  <p2  and  <p2  do  not  commute.  Consequently  direct  sum  cannot  extend  to 
a  coproduct  functor  in  the  category  of  all  groups. 

Instead,  the  appropriate  group  constructed  from  C2  and  C3  for  this  kind  of 
universal  mapping  property  is  the  “free  product”  of  C2  and  C3,  denoted  by 
C2  *  C3.  In  this  section  we  construct  the  free  product  of  any  set  of  groups, 
finite  or  infinite.  Also,  we  establish  its  universal  mapping  property  and  identify 
it  in  terms  of  generators  and  relations.  The  prototype  of  a  free  product  is  the  free 
group  F(S),  which  equals  a  free  product  of  copies  of  Z  indexed  by  S.  A  free 
product  is  always  an  infinite  group  if  at  least  two  of  the  factors  are  not  1 -element 
groups. 

An  important  application  of  free  products  occurs  in  the  theory  of  the  fundamen¬ 
tal  group  in  topology:  if  A  is  a  topological  space  for  which  the  theory  of  covering 
spaces  is  applicable,  and  if  A  and  B  are  open  subsets  of  X  with  X  =  A  U  B  such 
that  A  fl  B  is  nonempty,  connected,  and  simply  connected,  then  the  fundamental 
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group  of  X  is  the  free  product  of  the  fundamental  group  of  A  and  the  fundamental 
group  of  B.  This  result,  together  with  a  generalization  that  no  longer  requires 
A  n  B  to  be  simply  connected,  is  known  as  the  Van  Kampen  Theorem. 

Let  S  be  a  nonempty  set  of  groups  Gs  for  5  in  S.  The  set  S  is  allowed  to  be 
infinite,  but  in  practice  it  often  has  just  two  elements.  We  shall  describe  the  group 
defined  to  be  the  free  product  G  =  Gs.  We  start  from  the  set  W({G.V})  of 
all  words  built  from  the  groups  Gs .  This  consists  of  all  finite  sequences  gi  ■  ■  ■  gn 
with  each  g,  in  some  Gs  depending  on  i .  The  length  of  a  word  is  the  number  of 
factors  in  it.  The  empty  word  is  denoted  by  1 .  We  multiply  two  words  by  writing 
them  end  to  end,  and  the  resulting  operation  of  multiplication  is  associative.  A 
word  is  said  to  be  equivalent  to  a  second  word  if  the  first  can  be  obtained  from 
the  second  by  a  finite  sequence  of  steps  of  the  following  kinds  and  their  inverses: 

(i)  drop  a  factor  for  which  g,  is  the  identity  element  of  the  group  in  which  it 
lies, 

(ii)  collapse  two  factors  gigi+i  to  a  single  one  g*  if  gj  and  «,  + 1  lie  in  the  same 
Gs  and  their  product  in  that  group  is  g*. 

The  result  is  an  equivalence  relation,  and  the  set  of  equivalence  classes  is  the 
underlying  set  of  Gs. 

Theorem  7.13.  If  S  is  a  nonempty  set  of  groups  Gs  and  VT({Gv})  is  the  set 
of  all  words  from  the  groups  Gs,  then  the  product  operation  defined  on  IT  ( { Gs } ) 
descends  in  a  well-defined  fashion  to  the  set  Gs  of  equivalence  classes  of 
members  of  VT({GS}),  and  >h.ve.s  Gs  thereby  becomes  a  group.  For  each  .s'o  in 
S,  define  iSo  :  GSo  — >  ^  se.v  Gv  to  be  the  group  homomorphism  obtained  as  the 
composition  of  the  inclusion  of  Gso  into  words  of  length  1  followed  by  passage 
to  equivalence  classes.  Then  the  pair  ( >bVGs  Gv,  jis})  has  the  following  universal 
mapping  property:  whenever  H  is  a  group  and  {<ps  \  s  e  S]  is  a  system  of  group 
homomorphisms  tps  :  G  v  — >  //,  then  there  exists  a  unique  group  homomorphism 
tp  :  ^jjjGj  — >  H  such  that  tp  o  iSo  =  tpSo  for  all  sq  g  S. 


<p 

'f 'sssGj 

Figure  7.2.  Universal  mapping  property  of  a  free  product. 

Remarks.  The  group  >ksesG.?  is  called  the  free  product  of  the  groups  Gs. 
Figure  7.2  illustrates  its  universal  mapping  property.  This  universal  mapping 
property  actually  characterizes  ^ve.sG, ,  as  will  be  seen  in  Proposition  7.14.  One 
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often  writes  G\*-  ■  -*Gn  when  the  set  S  is  finite;  the  order  of  listing  the  groups  is 
immaterial.  The  proof  of  Theorem  7. 1 3  is  rather  similar  to  the  proof  of  Theorem 
7.1,  and  we  shall  skip  some  details. 

PROOF.  Let  us  write  ~  for  the  equivalence  relation  on  words,  and  let  us  denote 
equivalence  classes  by  brackets.  We  want  to  define  multiplication  in  ^s6jGj  by 

[w\][wt\  =  [ W\U)2\ .  To  see  that  this  formula  makes  sense  in  ^sesGs,  let  x,  x', 

and  y  be  words  in  W({GS}),  and  suppose  that  x  and  x '  differ  by  only  one  operation 
of  type  (i)  or  type  (ii)  as  above.  Then  x  ~  x' ,  and  it  is  evident  that  x'y  ~  xy  and 
yx'  ~  yx.  Iteration  of  this  kind  of  relationship  shows  that  w\  ~  w\  and  w'2  ~  W2 
implies  w[w'2  ~  w\  W2,  and  hence  multiplication  is  well  defined. 

The  associativity  of  multiplication  in  W({GS})  implies  that  multiplication  in 
'kvs.S'Gs  is  associative,  and  [1]  is  a  two-sided  identity.  We  readily  check  that  if 

g  =  gi  ■  ■  ■  gn  is  a  word,  then  the  word  g-1  =  g~ 1  •  •  •  g  J  1  has  the  property  that 
[g-1]  is  a  two-sided  inverse  to  [g  ].  Therefore  >!gvs,sGv  is  a  group. 

The  uniqueness  of  the  homomorphism  <p  in  the  universal  mapping  property 
is  no  problem  since  all  words  are  products  of  words  of  length  1  and  since  the 
subgroups  iS0(GS0)  together  generate  *seSGs. 

For  existence  of  <p,  we  begin  by  defining  a  function  O  ;  W({G.S})  — >  H  such 
that 


d>(g.s)  =  cps(gs)  for  gs  in  Gs  when  viewed  as  a  word  of  length  1. 

<b(wiu>2)  =  d>(wi)<b(W2)  for  W\  and  w 2  in  VF({GS}). 

We  take  the  formulas  <f>(gs)  =  (p(gs)  for  gs  in  Gs  as  a  definition  of  <f>  on  words 
of  length  1.  Any  member  of  IT ( { Gv } )  can  be  written  uniquely  as  gi  •  •  •  gn  with 
each  gi  in  GSi,  and  we  set  <f>(gi  •  •  •  gn)  =  <&{gi)  ■  ■  ■  <f>(g„).  (If  n  =  0,  the 
understanding  is  that  0(1)  =  1.)  Then  O  has  the  required  properties. 

Let  us  show  that  w'  ~  w  implies  O  ( w’ )  =  O  ( w ) .  The  questions  are  whether 

(i)  if  gi, . . . ,  gn  are  in  various  Gs’s  with  g,  equal  to  the  identity  I  s.  of  GSj , 
then 


0(gi  •  •  •  gi-1  lj; gi+l  •  •  •  gn)  =  0(g!  •  •  •  gi-tgi+1  •  ■■gn), 

(ii)  if  gi, . . . ,  gn  are  in  various  G/s  with  GSi  =  GSi+1  and  if  g,g,+1  =  g*  in 
GSj ,  then 

7 

•  •  •  gi-lgigi  +  lgi+2  ■  ■  ■  gn)  =  0(gi  •  •  •  gi-\g*gi+2  ■  ■  ■  gn)- 
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In  the  case  of  (i),  the  question  comes  down  to  whether  a  certain  h  <t>  ( 1  )  h '  in  H 
equals  hh\  and  this  is  true  because  <t> ( I  s. )  =  <ps.  ( 1  s. )  is  the  identity  of  H.  In 
the  case  of  (ii),  the  question  comes  down  to  whether  h<t>(gi)'t>(gi+i)h'  equals 
h<$>(g*)h'  if  G'v,  =  GSj+l  and  gigi+i  =  g*  in  GSi,  and  this  is  true  because 

=  <Ps,(gi)<Ps,(gi+  0  =  <PSi(gigi+ 1)  =  <PsM *)  =  *(*?)■  We 
conclude  that  w'  ~  w  implies  4>(u/)  =  4>(u;). 

We  may  therefore  define  <p([ in])  =  <f>  ( w )  for  [in]  in  F ( ( Gs } ) ,  and  tp  is  a 
homomorphism  of  F  ( ( Gs } )  into  H  as  a  consequence  of  the  property  O  ( w  i  uh  )  = 
d>(wi)0(w2)  of  4>  on  W({G,}).  For  gs  in  Gs,  we  have  <p([gj)  =  <F(g.s)  = 
(ps(gs),  i.e.,  <p(i(gs))  =  (ps(gs)-  This  completes  the  proof  of  existence.  □ 

Proposition  7.14.  Let  S  be  a  nonempty  set  of  groups  Gs.  Suppose  that  G'  is 
a  group  and  that  i's  :  Gs  G'  for  ,v  e  .S'  is  a  system  of  group  homomorphisms 
with  the  following  universal  mapping  property:  whenever  H  is  a  group  and 
{( ps  \  s  e  5}  is  a  system  of  group  homomorphisms  <ps  :  Gs  ->  H ,  then  there 
exists  a  unique  group  homomorphism  <p  :  G'  ^  H  such  that  cp  o  =  tps  for  all 
s  e  S.  Then  there  exists  a  unique  group  homomorphism  <t>  :  >!vv6,sGv  — >■  G' 

such  that  i's  =  $o  is  for  all  s  e  S.  Moreover,  $>  is  a  group  isomorphism,  and  the 
homomorphisms  i's  :  Gs  CF  are  one-one. 

Remarks.  As  was  true  with  Proposition  7.2,  readers  who  have  been  through 
Chapter  VI  will  recognize  that  Proposition  7.14  is  a  special  case  of  Problem  19 
at  the  end  of  that  chapter. 

PROOF.  Put  G  =  >Kys.s  Gv.  In  the  universal  mapping  property  of  Theorem 

7.13,  let  H  =  G1  and  <ps  =  i[ ,  and  let  T  :  G  ^  G'  be  the  homomorphism  <p 
produced  by  that  theorem.  Then  <f>  satisfies  <t>  o  is  =  i's  for  all  s.  Reversing  the 
roles  of  G  and  G',  we  obtain  a  homomorphism  O'  :  G’  — >  G  with  4>'  o  i's  =  is 
for  all  5.  Therefore  (O'  o$)o  is  =  O'  o  /'  =  is. 

Comparing  $'o$  with  the  identity  1  q  and  applying  the  uniqueness  in  the 
universal  mapping  property  for  G,  we  see  that  =  1G.  Similarly  the 

uniqueness  in  the  universal  mapping  property  of  G'  gives  4)  o  <$'  =  I  G>.  Thus  4> 
is  a  group  isomorphism.  It  is  uniquely  determined  by  the  given  properties  since 
the  various  subgroups  is(Gs )  generate  G.  Since  i's  =  4>  o  is  and  since  and  is 
are  one-one,  i\  is  one-one.  □ 

As  was  the  case  for  free  groups,  we  want  a  decision  procedure  for  telling 
whether  two  given  words  in  W ({Gv})  are  equivalent.  This  is  the  so-called  word 
problem  for  the  free  product.  Solving  it  allows  us  to  use  free  products  concretely, 
just  as  Proposition  7.3  allowed  us  to  use  free  groups  concretely.  A  word  in 
VI7  ({GY}))  is  said  to  be  reduced  if  it 

(i)  contains  no  factor  for  which  gj  is  the  identity  element  of  the  group  G  v  in 
which  it  lies, 


326 


VII.  Advanced  Group  Theory 


(ii)  contains  no  two  consecutive  factors  gj  and  gj+\  taken  from  the  same 
group  Gs. 


Proposition  7.15.  (solution  of  the  word  problem  for  free  products).  If  S  is  a 
nonempty  set  of  groups  Gs  and  W ((GY))  is  the  set  of  all  words  from  the  groups 
Gs .  then  each  word  in  W ((GY))  is  equivalent  to  one  and  only  one  reduced  word. 

Example.  Consider  the  free  product  C2  *  C2  of  two  cyclic  groups,  one  with  x  as 
generator  and  the  other  with  y  as  generator.  Words  consist  of  a  finite  sequence  of 
factors  of  x,  y,  the  identity  of  the  first  factor,  and  the  identity  of  the  second  factor. 
A  word  is  reduced  if  no  factor  is  an  identity  and  if  no  two  x ’s  are  adjacent  and  no 
two  y’s  are  adjacent.  Thus  the  reduced  words  consist  of  finite  sequences  whose 
terms  are  alternately  x  and  y.  Those  of  length  <  3  are  1  ,x,y,  xy.  yx,  xyx,  yxy, 
and  in  general  there  are  two  of  each  length  >  0.  The  proposition  tells  us  that  all 
these  reduced  words  give  distinct  members  of  C2  *  C 2.  In  particular,  the  group  is 
infinite. 

Remark.  More  generally,  to  test  whether  two  words  are  equivalent,  the 
proposition  says  to  eliminate  factors  of  the  identity  and  multiply  consecutive 
factors  in  each  word  when  they  come  from  the  same  group,  and  repeat  these  steps 
until  it  is  no  longer  possible  to  do  either  of  these  operations  on  either  word.  Then 
each  of  the  given  words  has  been  replaced  by  a  reduced  word,  and  the  two  given 
words  are  equivalent  if  and  only  if  the  two  reduced  words  are  identical.  Problems 
37-46  at  the  end  of  the  chapter  concern  C2  *  C3 ,  and  some  of  these  problems  make 
use  of  the  result  of  this  proposition— that  distinct  reduced  words  are  inequivalent. 

Proof  of  Proposition  7.15.  Both  operations— eliminating  factors  of  the 
identity  and  multiplying  consecutive  factors  in  each  word  when  they  come  from 
the  same  group— reduce  the  length  of  a  word.  Since  the  length  has  to  remain 
>  0,  the  process  of  successively  carrying  out  these  two  operations  as  much  as 
possible  has  to  stop  after  finitely  many  steps,  and  the  result  is  a  reduced  word. 
This  proves  that  each  equivalence  class  of  words  contains  a  reduced  word. 

For  uniqueness  of  the  reduced  word  in  an  equivalence  class,  we  proceed 
somewhat  as  with  Proposition  7.3,  associating  to  each  word  a  finite  sequence 
of  reduced  words  such  that  the  last  member  of  the  sequence  is  unchanged  when 
we  apply  an  operation  to  the  word  that  preserves  equivalence.  However,  there  are 
considerably  more  details  to  check  this  time. 

If  w  =  gi  •  •  •  g„  is  a  given  word  with  each  g ,  in  GSj,  then  we  associate  to  w 
the  sequence  of  reduced  words  xq,  Xi, ...  ,xn  defined  inductively  by 


x0 


=  1, 


si  > 


X]  = 


8 1 

1 


if  g  1  is  not  the  identity  of  G 
if  g  1  is  the  identity  of  GV| , 
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and  the  following  formula  for  i  >  2  if  x,-_i  is  of  the  reduced  form  hx-  ■  -hk  with 
hj  in  Gtj: 


x,  = 


h  i  •  •  •  hkgi 
/?i  •  •  -hk 
hi ■ --hk- i 
hi--- hk-ig* 


if  G  Sj  7^  Gtk  and  g,  is  not  the  identity  1  G,  of  GSj, 
if  gi  is  the  identity  1  Gi  of  GS;, 
if  Gtk  =  GSj  with  hkgi  =  1gs;  , 
if  G,t  =  GSi  with  hkgi  =  g*  /  1  GSi  ■ 


Put  r  (w)  =  xn .  We  check  inductively  for  i  >  0  that  each  x,  is  reduced.  In  fact,  x, 
for  i  >  2  begins  in  every  case  with  h  \  ■  ■  ■  hk-  i,  which  is  assumed  reduced.  The 
only  possible  reduction  for  x,  thus  comes  from  factors  that  are  adjoined  or  from 
interference  with  hk-  i,  and  all  possibilities  are  addressed  in  the  above  choices. 
Thus  r  (w)  =  x„  is  necessarily  reduced  for  each  word  w . 

If  gi  •  •  •  gn  is  reduced  as  given,  then  x,-  is  determined  by  the  first  possible  choice 
h  |  •  •  •  h^gi  every  time,  and  hence  x,-  =  gi  •  •  •  gi  for  all  i.  Therefore  we  obtain 
r(w )  =  w  if  w  is  reduced. 

Now  consider  the  equivalent  words 


w  =  g\---  gjgj+i  ---gn  and  w'  =  gi  ■  ■  ■  gjlGsgj+i  ■  ■  ■  g„. 


Form  xo, . . . ,  xn  for  w  and  x(j, . . . ,  x'n+x  for  w' .  Then  we  have  xj  =  x/ ;  let 
h  [  •  •  •  hk  be  a  reduced  form  of  x'-.  The  formula  for  xj+1  is  governed  by  the 
second  choice  in  the  display,  and  x'+1  =  h\  ■  ■  -  hk  =  x/ .  Then  xj+(-+1  =  xy+,  for 
1  <  i  <  n  —  j  as  well.  Hence  x'+1  =  x„,  and  r(w')  =  r(w). 

Next  suppose  that  g*  =  gjgj+i  in  GSj,  and  consider  the  equivalent  words 


W  =  gi---  gj-lgjgj+2  ---gn  and  w'  =  gl  ■  ■  ■  gj-\gjgj+\gj+2  ■  ■  ■  gn- 

As  above,  form  xo, . . . ,  x„  for  w  and  x(j, . . . ,  xj+]  for  w’.  Then  we  have  Xj-\  = 
xj_j,  and  we  let  h\  ■  ■  -  hk  be  a  reduced  form  of  Xj-\.  There  are  cases,  subcases, 
and  subsubcases. 

First  assume  G,k  ^  GSj .  Then  xj  equals  h  \  ■  ■  ■  hkgj  or  h\  ■  ■  ■  hk  in  the  two 
subcases  g*  ^  \Gs.  and  g*  =  1  Gs  .  In  the  first  subcase,  we  have  g*  ^  I G,  and 
Xj  =  h  \  ■  ■  ■  hkg*.  Then  xj  equals  h  \  ■  ■  ■  hkgj  or  hi  ■  ■  ■  hk  in  the  two  subsubcases 
gj  /  1  Gl.  and  gj  =  lGt..  In  the  first  subsubcase,  xj+1  =  hx  -  --hkg*  =  Xj 
whether  or  not  gj+\  =  I  Gs  .  In  the  second  subsubcase,  g*  =  gjgj+i  cannot  be 
I G,  ,  and  therefore  xj+1  =  h\  ■  ■  ■  hkg*  =  Xj. 

In  the  second  subcase  of  the  case  G,t  /  G Sj ,  we  have  g*  =  I G,  and  x;  = 
Xj- 1  =  h\  ■  ■  ■  hk-  Then  xj  equals  h\  ■  ■  ■  hkgj  or  hi  ■  -  -  hk  in  the  two  subsubcases 
gj  jz  I G  and  gj  =  I  c,  ■  In  both  subsubcases,  xj+1  =  h\  ■  ■  ■  hk,  so  that  xj+1  = 

Xj. 
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Now  assume  Gtk  =  GSj.  Then  Xj  equals  h\  ■  ■  ■  hk~\  hf  or  h  |  •  •  •  hk- 1  in  the 
two  subcases  hkg*  =  h'k  ■=/=■  I  G.  and  hkg*  =  I  Gs  .  In  the  first  subcase,  we 
have  hkgj  =  h*k  /  1  Gs.  and  xj  =  h\  ■  ■  ■  hk~\h*k.  Then  xj  equals  h\  ■  ■  ■  hk-\h!k  or 
h\  ■  ■  ■  h-k-i  in  the  two  subsubcascs  hkgj  =  h'k  ^  I  Gs  and  hkgj  =  1  Gj  .  In  the  first 
subsubcase,  h'kgj+ \  =  hkgjgj+ i  =  hkg*  =  h*k  implies  x'j+l  =  hi  •  •  •  hk-\h*k  = 
Xj.  In  the  second  subsubcase,  we  know  that  hk  cannot  be  I  Gs  and  hence  that 
gj+ 1  =  hkgjgj+\  =  hkg*  =  h*k  cannot  be  lGs. ;  thus  xj+1  =  hi  •  •  •  hk-\h*k  =  xj. 

In  the  second  subcase  of  the  case  G,t  =  GSj ,  we  have  hkg*  =  1  c,  and  xj  = 
h\  ■  ■  ■  hk- 1-  Then  x  -  equals  h  \  ■  ■  ■  hk-ih k  or  h\  ■  ■  ■  hk~ i  in  the  two  subsubcases 
hkgj  =  h*k  i=-  I  g,  and  hkgj  =  1  c,  ■  In  the  first  subsubcase,  g/+  \  cannot  be 
1  c1;  but  hk'  gj+\  =  hkgjgj+i  =  hkg*  =  lGs.;  hence  x'j+l  =  hi---hk-i  =  Xj. 
In  the  second  subsubcase,  x-  =  h\  ■  ■  ■  hk~ \  and  gj+  \  =  lGj  ,  so  that  x'+ ,  = 
h\  hk-\  =  Xj. 

We  conclude  that  xj+1  =  x;  in  all  cases.  Hence  xj+(+1  =  x7-+,  for  0  <  /  < 
«  —  /,  x'+1  =  x„,  and  r(u/)  =  /'(«,’).  Consequently  the  only  reduced  word  that 
is  equivalent  to  w  is  r(w).  □ 

Proposition  7.16.  Let  S  be  a  nonempty  set  of  groups  Gs,  and  suppose  that 
{ ,S'V ;  Rs)  is  a  presentation  of  Gv.  the  sets  being  understood  to  be  disjoint  for 
5G5.Then(U,sS^;  U.se5  Rs'j  is  a  presentation  of  the  free  product  >Kss,s’G  v. 

Remark.  One  effect  of  this  proposition  is  to  make  Proposition  7.8  available 
as  a  tool  for  use  with  free  products.  Using  Proposition  7.8  may  be  easier  than 
appealing  to  the  universal  mapping  property  in  Theorem  7.13. 

PROOF.  Put  5  =  |Jse<s  Ss  and  R  =  U,s5  Rs,  and  define  G  to  be  a  group  given 
by  generators  and  relations  as  G  =  (S-,  R).  Consider  the  function  from  Ss  into 
the  quotient  group  G  =  F {S) / N (R)  given  by  carrying  x  in  Ss  into  the  word 
x  in  S  and  then  passing  to  F(S)  and  its  quotient  G.  Because  of  the  universal 
mapping  property  of  free  groups,  this  function  extends  to  a  group  homomorphism 
4  :  F  ( Ss )  — »■  G.  If  r  is  a  reduced  word  relative  to  Ss  representing  a  member 
of  Rs,  then  r  is  carried  by  is  into  a  member  of  the  larger  set  R  and  then  into 
the  identity  of  G.  Since  ker  is  is  normal  in  F{Ss),  ker  is  contains  the  smallest 
normal  subgroup  N(RS )  in  F(Ss)  that  contains  Rs.  Proposition  4.1 1  shows  that 
4  descends  to  a  group  homomorphism  is  :  Gs  — >  G. 

We  shall  prove  that  G  and  the  system  {4}  have  the  universal  mapping  property 
of  Proposition  7.14  that  characterizes  a  free  product.  Then  it  will  follow  from 
that  proposition  that  G  =  ^  v6,vGA ,  and  the  proof  will  be  complete. 

Thus  let  FI  be  a  group,  and  let  (<pv  |  .v  e  S]  be  a  system  of  group  homo- 
morphisms  cps  :  Gs  — >■  H.  We  are  to  produce  a  homomorphism  <E>  :  G  — >■  H 
such  that  o  4  =  <ps  for  all  ,s\  and  we  are  to  prove  that  such  a  homomorphism 
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is  unique.  Let  qs  :  F(Ss )  — >  G s  be  the  quotient  homomorphism,  and  define 
<ps  :  F(Ss )  — >  H  by  tps  =  (ps  o  qs.  Now  define  <J>  :  S  —*■  H  as  follows:  if 
x  is  in  S,  then  x  is  in  a  set  Ss  for  a  unique  s  and  thereby  defines  a  member 
of  F(Ss)  for  that  unique  s;  <t> (x )  is  taken  to  be  tps{x).  The  universal  mapping 
property  of  the  free  group  F(S )  allows  us  to  extend  <t>  to  a  group  homomorphism, 
which  we  continue  to  call  <t>,  of  F(S)  into  H.  Let  r  be  a  nontrivial  relation  in 
R  c  F(S).  Then  r,  by  hypothesis  of  disjointness  for  the  sets  Ss,  lies  in  a  unique 
Rs.  Hence  <F(r)  =  tps(r)  =  tps(qs(r))  =  cps{ ls)  =  I H.  Consequently  the  kernel 
of  <t>  contains  the  smallest  normal  subgroup  N(R)  of  F(S)  containing  R,  and  <J> 
descends  to  a  homomorphism  :  G  — >  H .  This  <J>  satisfies 

cj)  o  is  o  qs  =  $o  4  =  =  <ps  =  tps  o  qs. 

Since  the  quotient  homomorphism  qs  is  onto  G.,,  we  obtain  T  o  is  =  tps,  and 
existence  of  the  homomorphism  T  is  established. 

For  uniqueness,  we  observe  that  the  identities  $o  is  =  <ps  imply  that  T  is 
uniquely  determined  on  the  subgroup  of  G  generated  by  the  images  of  all  is. 
Since  qs  is  onto  Gv,  this  subgroup  is  the  same  as  the  subgroup  generated  by  the 
images  of  all  is.  This  subgroup  contains  the  image  in  G  of  every  generator  of 
F(S)  and  hence  is  all  of  G.  Thus  is  uniquely  determined.  □ 


4,  Group  Representations 

Group  representations  were  defined  in  Section  IV.6  as  group  actions  on  vector 
spaces  by  invertible  linear  functions.  The  underlying  field  of  the  vector  space 
will  be  taken  to  be  C  in  this  section  and  the  next,  and  the  theory  will  then  be 
especially  tidy.  The  subject  of  group  representations  is  one  that  uses  a  mix  of 
linear  algebra  and  group  theory  to  reveal  hidden  structure  within  group  actions.  It 
has  broad  applications  to  algebra  and  analysis,  but  we  shall  be  most  interested  in 
an  application  to  finite  groups  known  as  Burnside’s  Theorem  that  will  be  proved 
in  the  next  section. 

Let  us  begin  with  the  abelian  case,  taking  G  for  the  moment  to  be  a  finite  abelian 
group.  A  multiplicative  character  of  G  is  a  homomorphism  /  :  G  — »■  S1  c  Cx 
of  G  into  the  multiplicative  group  of  complex  numbers  of  absolute  value  1 .  The 
multiplicative  characters  form  an  abelian  group  G  under  pointwise  multiplication 
of  their  complex  values:  (xx')(g)  =  X (g)x'(g)-  The  identity  of  G  is  the 
multiplicative  character  that  is  identically  1  on  G,  and  the  inverse  of  y  is  the 
complex  conjugate  of  x  ■ 

The  notion  of  multiplicative  character  adapts  to  the  case  of  a  finite  group  the 
familiar  exponential  functions  x  i->  e"lx  on  the  line,  which  can  be  regarded  as 
multiplicative  characters  of  the  additive  group  M/27tZ  of  real  numbers  modulo 
lit .  These  functions  have  long  been  used  to  resolve  a  periodic  function  of 
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time  into  its  component  frequencies:  The  device  is  the  Fourier  series  of  the 
function  /.  If  /  is  periodic  of  period  2n ,  then  the  Fourier  coefficients  of  / 
are  c„  =  ^  f{x)e~inx  dx,  and  the  Fourier  series  of  /  is  the  infinite  series 
Y^n=-oocneinx ■  A  portion  of  the  subject  of  Fourier  series  looks  for  senses  in 
which  fix)  is  actually  equal  to  the  sum  of  its  Fourier  series.  This  is  the  problem 
of  Fourier  inversion. 

A  similar  problem  can  be  formulated  when  M/27rZ  is  replaced  by  the  finite 
abelian  group  G.  The  exponential  functions  are  replaced  by  the  multiplicative 
characters.  One  can  form  an  analog  of  Fourier  coefficients  for  the  vector  space 
C (G,  C)  of  complex-valued  functions2  defined  on  G,  and  then  one  can  form  the 
analog  of  the  Fourier  series  of  the  function.  The  problem  of  Fourier  inversion 
becomes  one  of  linear  algebra,  once  we  take  into  account  the  known  structure  of 
all  finite  abelian  groups  (Theorem  4.56).  The  result  is  as  follows. 

Theorem  7.17  (Fourier  inversion  formula  for  finite  abelian  groups).  Let  G  be  a 
finite  abelian  group,  and  introduce  an  inner  product  on  the  complex  vector  space 
C(G,  C)  of  all  functions  from  G  to  C  by  the  formula 

<F,  F')  =  J2F(s)Wg), 

gsG 


the  corresponding  norm  being  ||  F  ||  =  (F,  F)  l/2.  Then  the  members  of  G  form  an 
orthogonal  basis  of  C(G,  C),  each  /  in  G  satisfying  ||x||2  =  |G|.  Consequently 
|  G  |  =  |  G  | ,  and  any  function  F  :  G  — >■  C  is  given  by  the  “sum  of  its  Fourier 
series”: 

F(g)  =  7^7 

X€G  ,lsG 

Remarks.  This  theorem  is  one  of  the  ingredients  in  the  proof  in  Chapter  I  of 
Advanced  Algebra  of  Dirichlet’s  theorem  that  if  a  and  b  are  positive  relatively 
prime  integers,  then  there  are  infinitely  many  primes  of  the  form  an  +  b.  In 
applications  to  engineering,  the  ordinary  Fourier  transform  on  the  line  is  often 
approximated,  for  computational  purposes,  by  a  Fourier  series  on  a  large  cyclic 
group,  and  then  Theorem  7.17  is  applicable.  Such  a  Fourier  series  can  be  com¬ 
puted  with  unexpected  efficiency  using  a  special  grouping  of  terms;  this  device 

2The  notation  C(G,  C)  is  to  be  suggestive  of  what  happens  for  G  =  S1  and  for  G  =  R1 ,  where 
one  works  in  part  with  the  space  of  continuous  complex-valued  functions  vanishing  off  a  bounded 
set.  In  any  event,  pointwise  multiplication  makes  C(G,  C)  into  a  commutative  ring.  Later  in  the 
section  we  introduce  a  second  multiplication,  called  “convolution,”  that  makes  C(G,C)  into  a  ring 
in  a  different  way.  In  Chapter  VIII  we  shall  introduce  the  “complex  group  algebra”  CG  of  G.  The 
vector  space  C(G,  C)  is  the  dual  vector  space  of  CG.  However,  C(G,  C)  and  CG  are  canonically 
isomorphic  because  they  have  distinguished  bases,  and  the  isomorphism  respects  the  multiplication 
structures— convolution  in  C(G,  C)  and  the  group-algebra  multiplication  in  CG. 
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is  called  the  fast  Fourier  transform  and  is  described  in  Problems  29-3 1  at  the 
end  of  the  chapter. 

PROOF.  For  orthogonality  let  y  and  y '  be  distinct  members  of  G,  and  put 
x"  =  XX'  =  xx'~l ■  Choose  g0  in  G  with  x"(g0)  +  1.  Then 

x"C?o)(  E,?sg  X"(g))  =  J2g€G  x'\gog )  =  Ei?sg  x"(g ), 
so  that  [1  -  x"(go)]  E^sg  X"C?)  =° 

and  therefore  Ei?€g  /"(#)  =  0. 

Consequently  (y,  y'}  =  Egec  X(g)x'(g)  =  HgeG  x"(g)  =  0. 

The  orthogonality  implies  that  the  members  of  G  are  linearly  independent, 
and  we  obtain  |G[  <  dimC(G,  C)  =  |G|.  Certainly  ||y  ||2  =  EgSG  ly(g)l2  = 
E,,  G  I  -  c  . 

To  see  that  the  members  of  G  are  a  basis  of  C(G,  C),  we  write  G  as  a  direct 
sum  of  cyclic  groups,  by  Theorem  4.56.  A  summand  Z/mZ  has  at  least  m  distinct 
multiplicative  characters,  given  by  j  mod  m  i->  eljI,jrlm  for  0  <  r  <  m  —  1,  and 
these  characters  extend  to  G  as  1  on  the  other  direct  summands  of  G.  Taking 
products  of  such  multiplicative  characters  from  the  different  summands  of  G, 
we  see  that  |G|  >  [G|.  Therefore  |G|  =  |G|,  and  G  is  an  orthogonal  basis  by 
Corollary  2.4.  The  formula  for  F(g)  in  the  statement  of  the  theorem  follows  by 
applyingTheorem3.11c.  □ 

Now  suppose  that  the  finite  group  G  is  not  necessarily  abelian.  Since  S1  is 
abelian.  Proposition  7.4  shows  that  x  takes  the  value  1  on  every  member  of  the 
commutator  subgroup  G'  of  G.  Consequently  there  is  no  way  that  the  multiplica¬ 
tive  characters  can  form  a  basis  for  the  vector  space  C(G,  C)  of  complex- valued 
functions  on  G.  The  above  analysis  thus  breaks  down,  and  some  adjustment  is 
needed  in  order  to  extend  the  theory. 

The  remedy  is  to  use  representations,  as  defined  in  Section  IV.6,  on  complex 
vector  spaces  of  dimension  >  1 .  We  shall  assume  in  the  text  that  the  vector  space 
is  finite-dimensional.  The  sense  in  which  representations  extend  the  theory  of 
multiplicative  characters  is  that  any  multiplicative  character  /  gives  a  represen¬ 
tation  R  on  the  1 -dimensional  vector  space  C  by  R(g)(z)  =  y  (g)z  for  g  in  G 
and  z,  in  C.  Conversely  any  1 -dimensional  representation  gives  a  multiplicative 
character:  if  R  is  the  representation  on  the  1 -dimensional  vector  space  V  and  if 
Vo  ^  0  is  in  V,  then  y  (g)  is  the  scalar  such  that  R(g)v o  =  y  (g)i’o-  It  is  enough 
to  observe  that  the  only  elements  of  finite  order  in  the  multiplicative  group  Cx 
are  certain  members  of  the  circle  S1,  and  then  it  follows  that  y  takes  values  i  n  S 1 . 
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In  the  higher-dimensional  case,  the  analog  of  the  multiplicative  character  / 
in  passing  to  a  1 -dimensional  representation  R  is  a  “matrix  representation.”  A 
matrix  representation  of  G  is  a  function  g  \-x  [p(g)ij  ]  from  G  into  invertible 
square  matrices  of  some  given  size  such  that  p(gigi)ij  =  Yl=\  P(g\)ikp(gi)kj- 
If  a  representation  R  acts  on  the  finite-dimensional  complex  vector  space  V,  then 
the  choice  of  an  ordered  basis  T  for  V  leads  to  a  matrix  representation  by  the 
formula 


Conversely  if  a  matrix  representation  g  h->  [p(g)ij]  and  an  ordered  basis  T  of  V 
are  given,  then  the  same  formula  may  be  used  to  obtain  a  representation  R  of  G 
on  V . 

In  contrast  to  the  1 -dimensional  case,  the  matrices  that  occur  with  a  matrix  rep¬ 
resentation  of  dimension  >  1  need  not  be  unitary.  The  correspondence  between 
unitary  linear  maps  and  unitary  matrices  was  discussed  in  Chapter  III.  When 
the  finite-dimensional  vector  space  V  has  an  inner  product,  a  linear  map  was 
defined  to  be  unitary  if  it  satisfies  the  equivalent  conditions  of  Proposition  3.18. 
A  complex  square  matrix  A  was  defined  to  be  unitary  if  A* A  =  I.  The  matrix 
of  a  unitary  linear  map  relative  to  an  ordered  orthonormal  basis  is  unitary,  and 
conversely  when  a  unitary  matrix  and  an  ordered  orthonormal  basis  are  given,  the 
associated  linear  map  is  unitary.  We  can  thus  speak  of  unitary  representations 
and  unitary  matrix  representations. 

Some  examples  of  representations  appear  in  Section  IV.6.  One  further  pair 
of  examples  will  be  of  interest  to  us.  With  the  finite  group  G  fixed  but  not 
necessarily  abelian,  we  continue  to  let  C(G,  C)  be  the  complex  vector  space  of 
all  functions  f  :  G  —>  C.  We  define  two  representations  of  G  on  C(G,  C):  the 
left  regular  representation  l  given  by  (l (g)f)(x)  =  f(g~lx )  and  the  right 
regular  representation  r  given  by  (r(g)f)(x)  =  f(xg).  The  reason  for  the 
presence  of  an  inverse  in  one  case  and  not  the  other  was  discussed  in  Section 
IV.6.  Relative  to  the  inner  product 

(/t.  fi)  = 

xeG 

both  i  and  r  are  unitary.  The  argument  for  i  is  that 

(1(g) f  1, 1(g) fl)  =  Wg)fl)(x)(t(g)f2)(x)  =  J2^~lx)f2(g-lx) 

xeG  xeG 

undcr^-G  YJMy)-m=(h,h), 

ysG 

and  the  argument  for  r  is  completely  analogous. 
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It  will  be  convenient  to  abbreviate  "representation  R  on  V”  as  “representa¬ 
tion  (R,  V).”  Let  {R,  V)  be  a  representation  of  the  finite  group  G  on  a  finite¬ 
dimensional  complex  vector  space.  An  invariant  subspace  U  of  V  is  a  vector 
subspace  such  that  R(g)U  C  U  for  all  g  in  G.  The  representation  is  irreducible 
if  V  0  and  if  V  has  no  invariant  subspaces  other  than  0  and  V . 

Two  representations  (R  i ,  V\)  and  ( Ri-  VV)  on  finite-dimensional  complex  vec¬ 
tor  spaces  are  equivalent  if  there  exists  a  linear  invertible  function  A  :  V\  — >■  V2 
such  that  =  Ri (g )  A  for  all  g  in  G.  In  the  terminology  of  Section 

IV.ll,  “equivalent”  is  the  notion  of  “is  isomorphic  to”  in  the  category  of  all 
finite-dimensional  representations  of  G. 

In  more  detail  a  morphism  from  (R\,  V\ )  to  ( Ri,  V2)  in  this  category  is  an 
intertwining  operator,  namely  a  linear  map  A  :  V\  — >  V2  such  that  AR\  (g)  = 
R2 (g )  A  for  all  g  in  G .  The  condition  for  this  equality  to  hold  is  that  the  diagram 
in  Figure  7.3  commute. 

V!  — ^  V2 


iRAg) 


Vi  — ^  V2 


Figure  7.3.  An  intertwining  operator  for  two  representations,  i.e.,  a  morphism 
in  the  category  of  finite-dimensional  representations  of  G. 


An  example  of  a  pair  of  representations  that  are  equivalent  is  the  left  and  right 
regular  representations  of  G  on  C(G,  C):  in  fact,  if  we  define  (Af)  (jc)  =  /(x-1), 
then 

(£(g)A/)(x)  =  (Af)(g~lx)  =  f{x~lg)  =  (r(g)/)(x_1)  =  (Ar(g)f)(x). 


Proposition  7.18  (Schur's  Lemma).  If  (R\,  V\ )  and  ( Ri,  V2)  are  irreducible 
representations  of  the  finite  group  G  on  finite-dimensional  complex  vector  spaces 
and  if  A  :  V)  — V2  is  an  intertwining  operator,  then  A  is  invertible  (and  hence 
exhibits  R\  and  Ri  as  equivalent)  or  else  A  =  0.  If  (R\,  V\)  =  (/?2,  V2)  and 
A  :  V\  — >■  V2  is  an  intertwining  operator,  then  A  is  scalar. 

Remark.  The  conclusion  that  A  is  scalar  makes  essential  use  of  the  fact  that 
the  underlying  held  is  C. 

PROOF.  The  equality  R2(g)Av  1  =  AR\  (y)i<i  shows  that  ker  A  and  image  A 
are  invariant  subspaces.  By  the  assumed  irreducibility,  ker  A  equals  0  or  Vt ,  and 
image  A  equals  0  or  V2.  The  first  statement  follows.  When  ( R\.  V\)  =  (7?2,  V2), 
the  identity  I  :  V\  — >■  V2  is  an  intertwining  operator.  If  X  is  an  eigenvalue  of  A, 
then  A  —  XI  is  another  intertwining  operator.  Since  A  —  XI  is  not  invertible  when 
X  is  an  eigenvalue  of  A,  A  —  XI  must  be  0.  □ 
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Corollary  7.19.  Every  irreducible  finite-dimensional  representation  of  a  finite 
abelian  group  G  is  1 -dimensional. 

PROOF.  If  ( R ,  V)  is  given,  then  the  linear  map  A  =  R(g )  satisfies  AR(x )  = 
R(gx )  =  R(xg)  =  R(x)A  for  all  x  in  G.  By  Schur’s  Lemma  (Proposition  7.18), 
A  =  R(g)  is  scalar.  Since  g  is  arbitrary,  every  vector  subspace  of  V  is  invariant. 
Irreducibility  therefore  implies  that  V  is  1  -dimensional.  □ 

Let  R  be  a  representation  of  the  finite  group  G  on  the  finite-dimensional 
complex  vector  space  V,  let  ( • ,  •  )o  be  any  inner  product  on  V,  and  define 


(rn,  v2)  =  ^2  (R(x)v  1,  R(x)v2) 0- 

xeG 


Then  we  have 

(R(g)vu  R(g)v2)  =  E  (R(x)R(g)vu  R(x)R(g)v2)o 

xeG 

=  E  (R(xg)vi,  R(xg)v2)o 

xeG 

=  E  (R(y)vi,  R(y)v 2)0  by  the  change  y  =  xg 

yeG 

=  (wi,  V2). 


With  respect  to  the  inner  product  ( • ,  • ),  the  representation  (R,  V)  is  therefore 
unitary.  In  other  words,  we  are  always  free  to  introduce  an  inner  product  to 
make  a  given  finite-dimensional  representation  unitary.  The  significance  of  this 
construction  is  noted  in  the  following  proposition. 

Proposition  7.20.  If  (R,  V)  is  a  finite-dimensional  representation  of  the  finite 
group  G  and  if  an  inner  product  is  introduced  in  V  that  makes  the  representation 
unitary,  then  the  orthogonal  complement  of  an  invariant  subspace  is  invariant. 

PROOF.  Let  U  be  an  invariant  subspace.  If  u  is  in  U  and  m1  is  in  f/1,  then 
(R(g)u-1,  u)  =  (R{g)~l  R(g)wL ,  R(g)~lu)  =  (i/-1,  R(g)~xu)  =  0.  Thus  u1-  in 
U implies  ^(glrr1-  is  in  U ±,  □ 

Corollary  7.21.  Any  finite-dimensional  representation  of  the  finite  group  G 
is  a  direct  sum  of  irreducible  representations. 

Remark.  That  is,  we  can  find  a  system  of  invariant  subspaces  such  that  the 
action  of  G  is  irreducible  on  each  of  these  subspaces  and  such  that  the  whole 
vector  space  is  the  direct  sum  of  these  subspaces. 
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PROOF.  This  is  immediate  by  induction  on  the  dimension.  For  dimension  0, 
the  representation  is  the  empty  direct  sum  of  irreducible  representations.  If  the 
decomposition  is  known  for  dimension  <  n  and  if  U  is  an  invariant  subspace 
under  R  of  smallest  possible  dimension  >  1,  then  U  is  irreducible  under  R,  and 
Proposition  7.20  says  that  the  subspace  U ±,  which  satisfies  V  =  U  ©  JJL,  is 
invariant.  It  is  therefore  enough  to  decompose  U1 ,  and  induction  achieves  such 
a  decomposition.  □ 

Proposition  7.22  (Schur  orthogonality).  For  finite-dimensional  representa¬ 
tions  of  a  finite  group  G  in  which  inner  products  have  been  introduced  to  make 
the  representations  unitary, 

(a)  if  (/?],  V\ )  and  ( R2,  V2)  are  inequivalent  and  irreducible,  then 

v[)(R2(x)v2,  v2)  =  0  for  all  v\,  v)  e  V\  and  v2,  v'2  e  V2. 

xeG 

(b)  if  ( R ,  V)  is  irreducible,  then 

E,d,  ,  - rr  \G\(vi,  v2)(v[,  v'2)  ,  ,  T. 

(R(x)vu  v1)(R(x)v 2,  v2)  = - - — - - —  for  m,  v2,  Vj,v2  e  V. 

dim  V 

xetr 


Remarks.  If  G  is  abelian,  then  V\  and  V2  in  (a)  are  1 -dimensional,  and  the 
conclusion  of  (a)  reduces  to  the  statement  that  the  multiplicative  characters  are 
orthogonal.  Conclusion  (b)  in  this  case  reduces  to  a  trivial  statement. 

PROOF.  For  (a),  let  /  :  VS  — ^  V\  be  any  linear  map,  and  form  the  linear  map 
L=  E  Ri(x)lR2(x~l). 

xeG 

Multiplying  on  the  left  by  R\  ( g )  and  on  the  right  by  i?2(g_1)  and  changing  vari¬ 
ables  in  the  sum,  we  obtain  R\ (g)L/?2(g-1)  =  L,  so  that  R\(g)L  =  LR2(g )  for 
all  g  e  G.  By  Schur’s  Lemma  (Proposition  7.18)  and  the  assumed  irreducibility 
and  inequivalence,  L  =  0.  Thus  (Lv'2,  v[)  =  0.  For  the  particular  choice  of  l  as 
l(w2)  =  (n>2,  v2)vu  we  have 

0  =  (Lv2,  v\)  =  E  (Rl(x)IR2(x-l)v'1,  v\) 

xeG 

=  E  (R\{x){R2{x~l)v’2,  v2)vi,  v\)  =  E  (Ri(x)vi,v'1)(R2(x~l)v2,v2}, 

xeG  xeG 

and  (a)  results  since  (R2(x~l)v2,  v2)  =  ( R2(x)v2 ,  v2). 

For  (b),  we  proceed  in  the  same  way,  starting  from  /  :  V  — >  V,  and  we  obtain 
L  =  XI  from  Schur’s  Lemma.  Taking  the  trace  of  both  sides,  we  find  that 
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X  dim  V  =  Tr  L  =  |G|  Tr  /. 

Therefore  X  =  \ G| (Tr  /) /  dim  V .  Since  L  =  XI, 

Again  we  make  the  particular  choice  of  /  as  l(u>2)  =  ( w2 ,  v2)v\  ■  Since  Tr/  = 
(t>i,  V2),  we  obtain 


Oi,  v2)(v'1,  v'2) 
dim  V 


Tr  1  _ 

^—-{v'vv'2)  =  \  G\-\Lv'2,v[) 
|G|-'  E  {R{x)lR(x~l)v'2 ,  v[) 

xeG 

I G|— 1  E  {R(X)(R(X~1W2,  V2)v !,v[) 

xeG 


=  |G[->  E  (R(x)vl,v[)(R(x^)v'2,v2), 

xeG 


and  (b)  results  since  ( R(x  l  v2 )  =  (R(x)v 2,  v2). 


□ 


Let  us  interpret  Proposition  7.22  as  a  statement  about  the  left  and  right  regular 
representations  i  and  r  of  G  on  the  inner-product  space  C  (G,  C) ,  the  inner  product 
being  (/,  f)  =  E»sg  Let  R  be  an  irreducible  representation  of  G 

on  the  finite-dimensional  vector  space  V,  and  introduce  an  inner  product  to  make 
it  unitary.  A  member  of  C(G,  C)  of  the  form  g  (R(g)v,  vr)  is  called  a  matrix 
coefficient  of  R.  Let  v\, ...  ,vn  be  an  orthonormal  basis  of  V.  The  matrix 
representation  of  G  that  corresponds  to  R  and  this  choice  of  orthonormal  basis 
has  p(g)ij  =  ( R(g)Vj ,  Vi),  and  hence  the  entries  of  [p(g)ij \,  as  functions  on  G, 
provide  examples  of  matrix  coefficients.  These  particular  matrix  coefficients  are 
orthogonal,  according  to  Proposition  7.22b,  with 


\P(s)ij\ 2  =  Y  Vi)(R(g)Vj,  Vi) 

geG  geG 


\G\{Vj,Vj){Vi,Vi)  _  | G | 

dim  V  dim  V 


Thus  the  functions  dim  V  p(x)ij  form  an  orthonormal  basis  of  an 

n 2 -dimensional  subspace  Vr  of  C  (G,  C),  where  n  =  dim  V.  The  vector  subspace 
Vr  has  the  following  properties: 

(i)  All  matrix  coefficients  of  R  are  in  VK,  as  is  seen  by  expanding  v  =  E/  cj  vj 
and  v'  =  Ei  d>Vi  and  obtaining  (R{g)v,  vr)  =  J2ij  Cjdi{R(g)Vj,  v,)  = 
Ei.j  Cjdip{g)ij. 
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(ii)  Vr  is  invariant  under  i  and  r  because 

Ug)(R(  ■  )v,  v')(x)  =  (R(g~lx)v,  v')  =  (R(x)v,  R(g)v'), 
r(g)(R(  ■  )v,  v')(x)  =  (R(xg)v,  v')  =  (R(x)R(g) v,  v'). 

(iii)  Any  representation  R '  equivalent  to  R  has  W  =  Vr. 

Let  us  see  how  Vr  decomposes  into  irreducible  subspaces  under  r .  The  com¬ 
putation  with  r  in  (ii)  above  shows,  for  each  i ,  that  the  vector  space  of  all  functions 
x  i->  (R(x)v.  V,)  for  v  e  V  is  invariant  under  r.  This  is  the  linear  span  of  the 
matrix  coefficients  obtained  from  the  ;th  row  of  \f>(x)lj\.  Define  a  linear  map  A 
from  V  into  this  vector  space  by  An  =  (R(  ■  )v.  t>,  ).  It  is  evident  that  A  is  one-one 
onto,  and  moreover  AR(g)v  =  ( R(  ■  )R(g)v ,  17)  =  r(g)(R(  ■  )v,  Vj)  =  r(g)Av. 
Thus  A  exhibits  this  space,  with  r  as  representation,  as  equivalent  to  (R.  V).  The 
space  Vr  is  the  direct  sum  of  these  spaces  on  i,  and  the  summands  are  orthogonal, 
according  to  Proposition  7.22b.  Thus  Vr  decomposes  under  r  as  the  direct  sum 
of  dim  V  irreducible  subspaces,  each  one  equivalent  to  (R,  V). 

One  can  make  a  similar  analysis  with  i,  using  columns  in  place  of  rows. 
However,  this  analysis  is  a  little  more  subtle  since  Vr,  acted  upon  by  i,  is  the 
direct  sum  of  dim  V  copies  of  the  “contragredient”  of  (R.  V),  rather  than  (R.  V) 
itself.  The  details  are  left  to  Problems  32-36  at  the  end  of  the  chapter. 

As  R  varies  over  inequivalent  representations,  these  vector  spaces  Vr  are 
orthogonal,  according  to  Proposition  7.22a.  The  claim  is  that  their  direct  sum  is 
the  space  C(G,  C)  of  all  functions  on  G.  We  argue  by  contradiction.  The  sum  is 
invariant  under  r ,  and  if  it  is  not  all  of  C  (G,  C),  then  we  can  find  a  nonzero  vector 
subspace  U  =  {/( • )}  of  C(G,  C)  orthogonal  to  all  the  spaces  Vr  such  that  U  is 
invariant  and  irreducible  under  r.  Let  u\, . . . ,  um  be  an  orthonormal  basis  of  IJ . 
Then  each  function  x  i->  ( r(x)uj ,  Ui)  is  orthogonal  to  U  by  construction,  i.e., 

0  =  ^  (r(x)uj,  Ui)f(x)  for  all  /  in  U . 

xeG 

Applying  the  Riesz  Representation  Theorem  (Theorem  3.12),  choose  a  member 
e  of  U  such  that  /( 1)  =  (/,  e )  for  all  /  in  U .  By  definition  of  r  (x)  and  e,  we 
find  that 

u(x )  =  (r(x)u)(  1)  =  (r(x)u,  e) 

for  all  u  in  U.  Substitution  and  use  once  more  of  Proposition  7.22b  gives 


0  =  ^  (r(x)uj,  Ui)(r(x)u,  e) 
xeG 


\G\(uj,  u)(iii,  e) 
dim  JJ 


for  all  i  and  j.  Since  we  can  take  u  =  uj  =  u  i  and  since  i  is  arbitrary,  this 
equation  forces  e  =  0  and  gives  a  contradiction.  We  conclude  that  the  sum  of  all 
the  spaces  Vr  is  all  of  C (G,  C).  Let  us  state  the  result  as  a  theorem. 
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Theorem  7.23.  For  the  finite  group  G,  let  {( Ra ,  Ua ) }  be  a  complete  set  of 
inequivalent  irreducible  finite-dimensional  representations  of  G,  and  let  Vro  be 
the  linear  span  of  the  matrix  coefficients  of  Ra.  Then 

(a)  the  spaces  Vru  are  mutually  orthogonal  and  are  invariant  under  the  left 
and  right  regular  representations  i  and  r, 

(b)  the  representation  (r,  Vru  )  is  equivalent  to  the  direct  sum  of  dim  Ua  copies 

ofC K,ua), 

(c)  the  direct  sum  of  the  spaces  V ru  is  the  space  C(G,  C)  of  all  complex¬ 
valued  functions  on  G. 

Moreover, 

(d)  the  number  of  Ra  ’s  is  finite, 

(e)  dim  VRa  =  (dim  Ua)2, 

(f)  any  irreducible  subspace  of  (r,  C (G,  C))  that  is  equivalent  to  (Ra,  Ua)  is 
contained  in  Vr  . 


Corollary  7.24.  Let  {{Ra,  Ua)}  be  a  complete  set  of  inequivalent  irreducible 
finite-dimensional  representations  of  the  finite  group  G,  and  let  da  =  dim  Ua.  In 
each  Ua,  introduce  an  inner  product  making  ( Ru.  Ua)  unitary.  For  each  a,  let 
{u\a\  . . . ,  n'r'i'] }  be  an  orthonormal  basis  of  Ua.  Then  the  functions  in  C (G,  C) 
given  by  y/\G\~l da  (Ra(x)Vja) ,  v-a>)  form  an  orthonormal  basis  of  C(G,  C). 
Consequently  every  /  in  C(G,  C)  satisfies 


f(x)  = 


1 

W\ 


^2 da  (  J2  f(y'>(R<y(y)vja>’  via))  )  {Ra(x)vf\  v{a>) 

Ot  i,j  yeG 


and 

Z! \f^\2  =  Z I Z v<ia>)\  ■ 

xeG  11  a  i,j  yeG 

Remarks.  The  first  displayed  formula  is  the  Fourier  inversion  formula 
for  an  arbitrary  finite  group  G  and  generalizes  Theorem  7.17,  which  gives  the 
result  in  the  abelian  case;  in  the  abelian  case  all  the  dimensions  da  equal  1 ,  and  the 
functions  (  Ra  v-a))  are  just  the  multiplicative  characters  of  G.  The  second 

displayed  formula  is  known  as  the  Plancherel  formula,  a  result  incorporating 
the  conclusion  about  norms  in  Parseval’s  equality  (Theorem  3. 1  Id). 

PROOF.  This  follows  form  (a),  (c),  and  (e)  in  Theorem  7.23,  together  with 
Theorem  3.11  and  the  remarks  made  before  the  statement  of  Theorem  7.23.  □ 


Corollary  7.25.  Let  {( Ra ,  Ua)}  be  a  complete  set  of  inequivalent  irreducible 
finite-dimensional  representations  of  the  finite  group  G,  and  let  da  =  dim  Ua. 

Then  Y.adl  =  \°\- 
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PROOF.  This  follows  by  counting  the  number  of  members  listed  in  the 
orthonormal  basis  of  C (G,  C)  given  in  Corollary  7.24.  □ 

We  shall  make  use  of  a  second  multiplication  on  the  vector  space  C(G,  C) 
besides  the  pointwise  multiplication  that  itself  makes  C(G,  C)  into  a  ring.  The 
new  multiplication  is  called  convolution  and  is  defined  by 

(/i  *  fa)(x)  =  Y  f\(y)f2(y~lx)  =  Y  Mxy~^My )> 

yeG  yeG 

the  two  expressions  on  the  right  being  equal  by  a  change  of  variables.  The  first  of 
the  expressions  on  the  right  equals  the  value  of  the  function  Z  vgG  fa (y)£(y)fa  at 
x  and  shows  that  the  convolution  is  an  average  of  the  left  translates  of  fa  weighted 
by  fa .  Convolution  is  associative  because 

(fa  *  (fa  *  fa))(x)  =  Y  -f'  (yfafa  *  fa)(y~lx)  =  Y  /iM f2(y  l xz_1) 

y  y,z 

=  Y (f1  *  h)(xz~l)h(z)  =  ((fa  *  fa)  *  fa)(x), 

z 

and  one  readily  checks  that  C  (G,  C)  becomes  a  ring  when  convolution  is  used  as 
the  multiplication. 

For  any  finite-dimensional  representation  ( R ,  V)  and  any  v  in  V,  let  us  define 
R(f)v  =  yZxcjj  .f  (x)R(x)v.  Convolution  has  the  property  that 

R(fa*fa)  =  R(fa)R(fa) 

because 

R(fa i  *  fa)v  =  J2x(f i  *  fa)(x)R(x)v  =  J2x,y  fa(xy~x)fa(y)R(x)v 

=  J2x,y  fa  (x)fa(y)R(xy)v  =  J2X  fa  (x)R(x)(J2y  fi  (y)R(y)v) 

=  Zx  fa(x)R(x)R(fa)v  =  R(fa)R(fa)v. 

We  shall  combine  the  notion  of  convolution  with  the  notion  of  a  “character."  If 
(  R.  V)  is  a  finite-dimensional  representation  of  G,  then  the  character  of  (R.  V) 
is  the  function  yA>  given  by 

Xr(x)  =  Tr  R(x), 

with  Tr  denoting  the  trace.  Equivalent  representations  have  the  same  character 
since  Tr(  AR(x)/G^ )  =  Tr  R(x)  if  A  is  invertible.  Characters  have  the  additional 
properties  that 

(i)  XR(gxg~l)  =  xR(x)  because  Tr  /?(gxg_1)  =  Tr (R(g)R(x)R(g)~1)  = 
Tr  R(x), 

(ii)  //j  0.,.0/j  =  xR  +  •  •  •  +  Xr„  since  the  trace  of  a  block-diagonal  matrix 
is  the  sum  of  the  traces  of  the  blocks. 
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The  character  of  a  1 -dimensional  representation  is  the  associated  multiplicative 
character.  Here  is  an  example  of  a  character  for  a  representation  on  a  space  of 
dimension  more  than  1;  its  values  are  not  all  in  S1. 

Example.  The  dihedral  group  Dn  with  2 n  elements,  defined  in  Section  IV.  1 , 
is  isomorphic  to  the  matrix  group  generated  by 

(cos 2 n/n  —sin2n/n\  1  (  \  0\ 

.  '  '  I  and  y  =  (  .  ,  . 

sm2jT/n  cos 2 ic/n  J  J  y0  —  1  ) 

The  map  carrying  each  matrix  of  the  group  to  itself  is  a  representation  of  Dn  on 
C2.  The  value  of  the  character  of  this  representation  is  2  cos  Ink/n  on  xk  for 
0  <  k  <  n  —  1 ,  and  the  value  of  the  character  is  0  on  y  and  on  the  remaining 
77  —  1  elements  of  the  group. 

Computations  with  characters  are  sometimes  aided  by  the  use  of  inner  products . 
If  an  inner  product  is  imposed  on  a  finite-dimensional  complex  vector  space  V 
and  if  { i>, }  is  an  orthonormal  basis,  then  the  trace  of  a  linear  A  :  V  — >  V  is  given 
by  Tr  A  =  JV  (Aiy-,  t>, ) .  If  R  is  a  representation  on  V,  we  consequently  have 
X.r(x)  =  J2i  (R(x)vi,  Vi). 


Proposition  7.26.  Let  R,  Ri,  and  Ri  be  irreducible  finite-dimensional  repre¬ 
sentations  of  a  finite  group  G.  Then  their  characters  satisfy 

(a)  £*sG \XRU)\2  =  \G\, 

(b)  £xeG  XRl  (x)Xr2(x)  =  0  if  R\  and  R2  are  inequivalent. 

PROOF.  These  follow  from  Schur  orthogonality  (Proposition  7.22):  For  (a), 
let  R  act  on  the  vector  space  V,  let  d  =  dim  V,  introduce  an  inner  product  with 
respect  to  which  R  is  unitary,  and  let  j  r,- }  be  an  orthonormal  basis  of  V.  Then 
Proposition  7.22b  gives 

Ex  lxflO)l2  =  Ex  ( Ei  (R(x)vi,  u,-))(E;  (RMvj,  vj)) 

=  E;,;  Ex^W^’  Vi)(R(x)Vj,  Vj) 

=  E/.J  \G\d-lSijSij  =  E,-  \G\d~'  =  |G|. 

Part  (b)  is  proved  in  the  same  fashion,  using  Proposition  7.22a.  □ 

Let  us  now  bring  together  the  notions  of  convolution  and  character.  A  class 
function  on  G  is  a  function  /  in  C(G,  C)  with  /(gxg-1)  =  f(x)  for  all  g  and 
x  in  G.  That  is,  class  functions  are  the  ones  that  are  constant  on  each  conjugacy 
class  of  the  group.  Every  character  is  an  example  of  a  class  function.  The  class 
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functions  form  a  vector  subspace  of  C(G,  C),  and  the  dimension  of  this  vector 
subspace  equals  the  number  of  conjugacy  classes  in  G.  Class  functions  are  closed 
under  convolution  because  if  f\  and  fi  are  class  functions,  then 

(/i  *  fi)(gxg~l)  =  J2y  Mgxg-'y-^fiiy)  =  Ev  Mxg-ly~1g)f2(g-1yg) 
=  Hz  Mxz~l)f2(z)  =  (fl  *  fl)(x). 

On  an  abelian  group  every  member  of  C(G,  C)  is  a  class  function. 

Theorem  7.27  (Fourier  inversion  formula  for  class  functions).  For  the  finite 
group  G,  let  {(Ra,  Ua)}  be  a  complete  set  of  inequivalent  irreducible  finite¬ 
dimensional  representations  of  G.  If  /  is  a  class  function  on  G,  then 

/(x)  =  ITT 

'  '  a  yeG 

Remark.  This  result  may  be  regarded  as  a  second  way  (besides  the  one  in 
Corollary  7.24)  of  generalizing  Theorem  7.17  to  the  nonabelian  case. 

PROOF.  Using  the  result  and  notation  of  Corollary  7.24,  we  have 

fix)  =  |  g  | — 1  E<4  E  (  E  f(y)(Ra(y)vja) ,  uf0)- 

o'  i,j  '  ye  G  ' 

Replace  f  (y)  by  f(gyg~l)  since  f  is  a  class  function,  and  then  change  variables 
and  sum  over  g  in  G  to  see  that  |G|/(x)  is  equal  to 

|G|-'  E<4  E  (Hf(yKRa(y)Ra(g)v^\  Ra(g)vf))(Ra(x)vf\  vf). 

Within  this  expression  we  have 

H(Ra(y)RAg)v(ia\Ra(g)v^)) 

g 

=  E  (Rc/(y)(Ra(g)vl°!\  vla))v<ka\  Ra(g)vja>) 

g,k 

=  E  (Ra(g)vla\  vla))(Ra(g)vja\  Ra(y)v^) 
g’k 

=  ^  E  (v,<“)’  v{i0‘))(Ra(y)v(kc,\  v(ka>)  by  Schur  orthogonality 
k 

=  g 

=  W"  su  Xg.W- 

Substituting,  we  obtain  the  formula  of  the  theorem.  □ 
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Corollary  7.28.  If  G  is  a  finite  group,  then  the  number  of  irreducible  finite¬ 
dimensional  representations  of  G,  up  to  equivalence,  equals  the  number  of  con- 
jugacy  classes  of  G. 

PROOF.  Theorem  7.27  shows  that  the  irreducible  characters  span  the  vector 
space  of  class  functions.  Proposition  7.26b  shows  that  the  irreducible  characters 
are  orthogonal  and  hence  are  linearly  independent.  Thus  the  number  of  irreducible 
characters  equals  the  dimension  of  the  space  of  class  functions,  which  equals  the 
number  of  conjugacy  classes.  □ 

Example.  The  above  information  already  gives  us  considerable  control  over 
finding  a  complete  set  of  inequivalent  irreducible  finite-dimensional  representa¬ 
tions  of  elementary  groups.  We  know  that  the  number  of  such  representations 
equals  the  number  of  conjugacy  classes  and  that  the  sum  of  the  squares  of  their 
dimensions  equals  |G|.  For  the  symmetric  group  S3  of  order  6,  for  example,  the 
conjugacy  classes  are  given  by  the  cycle  structures  of  the  possible  permutations, 
namely  the  cycle  structures  of  (1),  (1  2),  and  (1  2  3).  Hence  there  are  three 
inequivalent  irreducible  representations.  The  sum  of  the  squares  of  the  three 
dimensions  is  to  be  6;  thus  we  have  two  of  dimension  1  and  one  of  dimension  2. 
The  multiplicative  characters  1  and  sgn  are  the  two  of  dimension  1 ,  and  the  one 
of  dimension  2  can  be  taken  to  be  the  2-dimensional  representation  of  Dy  whose 
character  was  computed  in  the  example  preceding  Proposition  7.26. 

One  final  constraint  on  the  dimensions  of  the  irreducible  representations  of  a 
finite  group  G  is  as  follows. 

Proposition  7.29.  If  G  is  a  finite  group  and  ( R ,  V)  is  an  irreducible  finite¬ 
dimensional  representation  of  G,  then  dim  V  divides  |G|. 

For  example,  if  |  G  |  =  p2  with  p  prime,  then  it  follows  from  Propositions 
7.29  and  7.25  that  every  irreducible  finite-dimensional  representation  of  G  has 
dimension  1 ,  and  one  can  easily  conclude  from  this  fact  that  G  is  abelian.  (See 
Problem  14  at  the  end  of  the  chapter.)  Thus  we  recover  as  an  immediate  conse¬ 
quence  the  conclusion  of  Corollary  4.39  that  groups  of  order  p2  are  abelian. 

The  proof  of  Proposition  7.29  is  surprisingly  subtle.  We  shall  obtain  the 
theorem  as  a  consequence  of  Theorem  7.31  below,  a  theorem  that  will  be  used 
also  in  the  proof  of  Burnside's  Theorem  in  the  next  section.  Theorem  7.31  gives  a 
little  taste  of  the  usefulness  of  algebraic  number  theory,  and  we  shall  see  more  of 
this  usefulness  in  Chapter  IX.  The  application  to  Burnside’s  Theorem  will  use  the 
Fundamental  Theorem  of  Galois  Theory,  whose  proof  is  deferred  to  Chapter  IX. 

An  algebraic  integer  is  any  complex  number  that  is  a  root  of  a  monic  poly¬ 
nomial  with  coefficients  in  7L.  For  example,  \/2  and  j  ( I  +  isj 3)  are  algebraic 


4.  Group  Representations 


343 


integers  because  they  are  roots  of  X2  —  2  and  X2  —  X  +  1 ,  respectively.  Any 
root  of  unity  is  an  algebraic  integer,  being  a  root  of  some  polynomial  Xn  —  1 . 
The  set  of  algebraic  integers  will  be  denoted  in  this  chapter  by  O.  Before  stating 
Theorem  7.31,  let  us  establish  two  elementary  facts  about  O. 

Lemma  7.30.  The  set  O  of  algebraic  integers  is  a  ring,  and  O  fl  Q  =  Z. 

PROOF.  Suppose  that  x  and  y  are  complex  numbers  satisfying  the  polynomial 
equations xm+arn-\xm~l+-  ■  ■+a\x+aa  =  Oand/'+h^-i  y"_1+-  •  • +b\y+bo  = 
0,  each  with  integer  coefficients.  Form  the  subset  of  C  given  by 

m—  1  n—  1 

m=  E  E  z*V. 

k= 0  /=0 

This  is  a  finitely  generated  subgroup  of  the  abelian  group  C  under  addition.  It 
satisfies 

m  n— 1  n— 1 

xM  =  E  E  Zx*y  c  M  +  £  Z/x'" 

&=1  /=0  1=0 

n— 1 

=  M  +  E  27y;(— amU%xm~l  —  •  •  •  —  —  a0)  ^  Af, 

/=o 

and  similarly  vM  C  M.  Hence  (x  ±  y)M  C  M  and  xv  C  M. 

To  prove  that  O  is  a  ring,  it  is  enough  to  show  that  if  A  is  a  nonzero  finitely 
generated  subgroup  of  the  abelian  group  C  under  addition  and  if  z  is  a  complex 
number  with  zN  C  N,  then  z  is  an  algebraic  integer.  By  Theorem  4.56,  A  is  a 
direct  sum  of  cyclic  groups.  Since  every  nonzero  member  of  C  has  infinite  order 
additively,  these  cyclic  groups  must  be  copies  of  Z.  So  N  is  free  abelian.  Let 
Zi, . . . ,  zn  be  a  Z  basis  of  N.  Here  n  >  0.  Since  zN  C  N,  we  can  find  unique 
integers  Cjj  such  that 

n 

ZZi  =  E  cijZj  f°r  1  <i<n. 
j=  i 

r\ 

This  equation  says  that  the  matrix  C  =  [c!;]  has  I  :  I  as  an  eigenvector  with 

eigenvalue  z.  Therefore  the  matrix  zi  —  C  is  singular,  and  detfz/  —  C)  =  0. 
Since  det(z/  —  C)  is  a  monic  polynomial  expression  in  z  with  integer  coefficients, 
z  is  an  algebraic  integer. 

To  see  that  OnQ  =  Z,  let  p  and  q  be  relatively  prime  integers  with  q  >  0,  and 
suppose  that  p/q  is  a  root  of  Xn  +  an-\Xn~l  +  •  •  •  +ci\X  +  aQ  withn„_i, . . . ,  Go 
in  Z.  Substituting  p/q  for  X,  setting  the  expression  equal  to  0,  and  clearing 
fractions,  we  obtain  pn  +  a„-\ pn~lq  +  •  •  •  +  a\pq"~l  +  aoqn  =  0.  Since  q 
divides  every  term  here  after  the  first,  we  conclude  that  q  divides  p" .  Since 
GCD(p,  q)  =  1,  we  conclude  that  q  =  1.  Thus  p/q  is  in  Z.  □ 
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Lemma  7.30  allows  us  to  see  that  if  G  is  a  finite  group  and  x  is  the  irreducible 
character  corresponding  to  an  irreducible  finite-dimensional  representation  R , 
then  x  (x)  is  an  algebraic  integer  for  each  x  in  G.  In  fact,  the  subgroup  H  of  G 
generated  by  x  is  cyclic  and  is  in  particular  abelian.  Corollary  7.21  says  that  R  | 
is  the  direct  sum  of  irreducible  representations  of  H,  and  Corollary  7.19  says  that 
each  such  irreducible  representation  is  1 -dimensional.  Thus  in  a  suitable  basis, 
R  |  is  diagonal.  The  diagonal  entries  must  be  roots  of  unity  (in  fact,  Nth  roots 
of  unity  if  x  has  order  N),  and  x  (*)  is  thus  a  sum  of  roots  of  unity.  By  Lemma 
7.30,  x(x)  is  an  algebraic  integer. 

Theorem  7.31.  Let  G  be  a  finite  group,  (R,  V)  be  an  irreducible  finite¬ 
dimensional  representation  of  G,  x  be  the  character  of  R ,  and  C  be  a  conjugacy 
class  in  G.  Denote  by  x(C)  the  constant  value  of  /.  on  the  conjugacy  class  C. 
Then  |C|/(C)/  dim  V  is  an  algebraic  integer. 

PROOF.  If  /  is  any  class  function  on  G,  then  R(f)  commutes  with  each  R(x) 
forx  in  G  because  R(f )  =  Ey  /  00^00  yields 

R(x)R(f)R(x)-1  =  j:f(y)R(x)R(y)R(xrl  =  E  f(y)R{xyx~l) 

y  y 

=  E  f(x~lzx)R(z)  =  E  f(z)R(z)  =  R(f). 

z  z 

By  Schur’s  Lemma  (Proposition  7.18),  Rif)  is  scalar.  If  C  is  a  conjugacy  class, 
then  the  function  Iq  that  is  1  on  C  and  is  0  elsewhere  is  a  class  function,  and  hence 
R(Ic )  is  a  scalar  kc.  As  C  varies,  the  functions  R  form  a  vector-space  basis  of 
the  space  of  class  functions.  The  formula  ( Ic  *  Ic’)(x )  =  Ev  Ic(y)Ic'(ylx) 
shows  that  Ic  *  Ic  is  integer-valued,  and  we  have  seen  that  the  convolution  of 
two  class  functions  is  a  class  function.  Therefore  Ic  *  Ic  =  Ec"  nccc"  Ic"  for 
suitable  integers  nccc"-  Application  of  R  gives  )-ci-c  =  Ec"  ncc'c"^c"-  If  we 
fix  C  and  let  A  be  the  square  matrix  with  entries  Ace  =  nccc",  we  obtain 

Xc^c  =  Acc"I-c"  ■ 

C" 

This  equation  says  that  the  matrix  A  has  the  column  vector  with  entries  kc"  as 
an  eigenvector  with  eigenvalue  kc  ■  Therefore  the  matrix  kc  I  —  A  is  singular, 
and  det(Ac/  —  A)  =  0.  Since  det(Ac/  —  A)  is  a  monic  polynomial  expression 
in  kc  with  integer  coefficients,  kc  is  an  algebraic  integer.  Taking  the  trace  of  the 
equation  R (Ic)  =  kcR  we  obtain  E.vec  X(x)  =  ^c  dim  V.  Since  /(x)  =  /(C) 
for  x  in  C,  the  result  is  that  |C|/(C)/dim  V  =  kc-  Since  kc  is  an  algebraic 
integer,  \C \ x  (C)/  dim  V  is  an  algebraic  integer.  □ 
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Proof  that  Theorem  7.31  implies  Proposition  7.29.  Proposition  7.26a 
gives 

\g\  =  E,sGI/WI2  =  Ec E.y6clx(*)l2  =  y-  /|C|/(C)x 

dim  V  dim  V  dim  V  ^  V  dim  V  'X(  ’ 

Each  term  in  parentheses  on  the  right  side  is  an  algebraic  integer,  according  to 
Theorem  7.31,  and  therefore  Lemma  7.30  shows  that  \  G\/ dim  V  is  an  algebraic 
integer.  Since  |G|/  dim  V  is  in  Q,  Lemma  7.30  shows  that  \G\/  dim  V  is  in  Z.  □ 


5.  Burnside’s  Theorem 


The  theorem  of  this  section  is  as  follows. 


Theorem  7.32  (Burnside’s  Theorem).  If  G  is  a  finite  group  of  order  paqh  with 
p  and  q  prime  and  with  a  +  b  >  1,  then  G  has  a  nontrivial  normal  subgroup. 

The  argument  will  use  the  result  Theorem  7.31  from  algebraic  number  the¬ 
ory,  and  also  it  will  make  use  of  a  special  case  of  the  Lundamental  Theorem 
of  Galois  Theory,  whose  proof  is  deferred  to  Chapter  IX.  That  special  case  is 
the  following  statement,  whose  context  was  anticipated  in  Section  IV.  1 ,  where 
groups  of  automorphisms  of  certain  fields  were  discussed  briefly.  Since  the  set 
{1,  e2711/" ,  e2'2?n/",  e3-2jn /n ^  }  js  linearly  dependent  over  Q,  Proposition  4.1  in 
that  section  implies  that  the  subring  Q[e2jr'/”]  of  C  generated  by  Q  and  e2jT,/"  is  a 
subfield  and  is  a  finite-dimensional  vector  space  over  Q.  According  to  Example  9 
of  that  section,  the  group  F  =  Gal(Q[c27r'/''  ]/Q)  of  automorphisms  of  Q[e2jr'/"] 
fixing  every  element  of  Q  is  a  finite  group. 


Proposition  7.33  (special  case  of  the  Fundamental  Theorem  of  Galois  Theory). 
Let  n  >  0  be  an  integer,  and  put  K  =  Q\e2TT‘2"  |.  Let  T  be  the  finite  group  of 
field  automorphisms  of  K  fixing  every  element  of  Q.  Then  the  only  members  ji 
of  K  such  that  n  (ft)  =  fj  for  every  a  in  T  are  the  members  of  Q. 


Lemma  7.34.  Let  G  be  a  finite  group,  (/?,  V)  be  an  irreducible  finite¬ 
dimensional  representation  of  G,  x  be  the  character  of  R ,  and  C  be  a  conjugacy 
class  in  G.  If  GCD(|C|,  dim  V)  =  1  and  if  x  is  in  C,  then  either  R(x)  is  scalar 
or  x  (a)  =  0. 


PROOF.  Define  x(C)  to  be  the  constant  value  of  /  on  C,  and  put  a  = 
x(x)/dimV  =  x(C)/ dimV.  Since  GCD(|C|,  dim  V)  =  1,  we  can  choose 
integers  m  and  n  with  m\C\  +  n  dim  V  =  1 .  Multiplication  by  a  yields 


m|C|X(C) 


+  nx(C)  =  a. 
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Theorem  7.31  shows  that  the  coefficients  ;  and  /(C)  of  m  and  n  on  the 
left  side  are  algebraic  integers,  and  therefore  a  is  an  algebraic  integer.  As  we 
observed  toward  the  end  of  the  previous  section,  x(x)  =  x(C)  is  the  sum  of 
dim  V  roots  of  unity.  Since  a  =  /  (C )  /  dim  V ,  we  see  that  |  a  |  <  1  with  equality 
only  if  all  the  roots  of  unity  are  equal,  in  which  case  R(x )  is  scalar.  In  view  of 
the  hypothesis,  we  may  assume  that  |a|  <  1.  We  shall  show  that  a  =  0. 

Let  K  =  Q[e27r'/IGI]  be  the  smallest  subheld  of  C  containing  Q  and  the 
complex  number  <j2-t'/!GI,  an(j  |et  p  be  t|le  group  of  held  automorphisms  of  K 
that  hx  every  element  of  Q.  We  know  that  K  is  hnite-dimensional  over  Q  and 
that  T  is  a  hnite  group,  and  Proposition  7.33  shows  that  the  only  members  of  K 
hxed  by  every  element  of  T  are  the  members  of  Q. 

Our  element  x  of  G  has  x  6  =  1.  Thus  every  root  of  unity  contributing 
to  x(x)  is  a  |G|th  root  of  unity  and  is  in  K.  Therefore  the  algebraic  integer  a 
is  in  K.  If  a  is  in  T,  each  of  the  |G|th  roots  of  unity  is  mapped  by  a  to  some 
complex  number  x  satisfying  =  1 ,  and  hence  the  member  a  (o')  of  K  satisfies 
|cr(a)[  <  L  Also,  a  (a)  is  an  algebraic  integer,  as  we  see  by  applying  a  to  the 
monic  equation  with  integer  coefficients  satished  by  a,  and  we  are  assuming  that 
|o£ |  <  1.  Consequently  jJ>  =  ]~[0 ,  r  CT(°0  is  an  algebraic  integer  and  has  absolute 
value  <  1 .  A  change  of  variables  in  the  product  shows  that  /I  is  hxed  by  every 
member  of  T,  and  we  see  from  the  previous  paragraph  that  (J>  is  in  Q.  By  Lemma 
7.30,  ft  is  in  Z.  Being  of  absolute  value  less  than  1,  it  is  0.  Thus  a  =  0,  and 
X(x)  =  0.  □ 

Lemma  7.35,  Let  G  be  a  hnite  group,  and  let  C  be  a  conjugacy  class  in  G 
such  that  |C|  =  pk  for  some  prime  p  and  some  integer  k  >  0.  Then  there  exists 
an  irreducible  hnite-dimensional  representation  R  /  1  of  G  with  R(x)  scalar  for 
every  x  in  C.  Consequently  G  is  not  simple. 

PROOF.  The  conjugacy  class  C  cannot  be  {1}  because  |{1}|  /  pk  with  k  >  0. 
Let  xreg  be  the  character  of  the  right  regular  representation  r  of  G  on  C  (G,  C).  If 
Ig  denotes  the  function  that  is  1  at  g  and  is  0  elsewhere,  then  the  functions  /„  form 
an  orthonormal  basis  of  C(G,  C),  and  therefore  xrCg/x)  =  (r  (x)/g,  Ig )  = 

J2geG  Vgx~l  -  Ig)-  Every  term  on  the  right  side  is  0  if  x  ■=/=■  1,  and  thus  Theorem 
7.23  gives 

o  =  Xreg (x)  =  1  +  XX x  W  for  X  e  C,  (4=) 

x#i 

the  sum  being  taken  over  all  irreducible  characters  other  than  1 ,  with  dx  being 
the  dimension  of  an  irreducible  representation  corresponding  to  x  •  Let  Rx  be  an 
irreducible  representation  with  character  x  •  Any  x  such  that  p  does  not  divide 
dx  has  GCD(|C|,  dx)  =  1  since  |C|  is  assumed  to  be  a  power  of  p.  Arguing  by 
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contradiction,  we  may  assume  that  no  such  /  has  Rx  (x)  scalar,  and  then  Lemma 
7.34  says  that  /  (x )  =0  for  all  such  / .  Hence  (*)  simplifies  to 

0  =  1+  ^  dxx(x)  forx  e  C.  (**) 

X  7^  1 ,  p  divides  dx 

Since  /  (x)  is  an  algebraic  integer.  Lemma  7.30  shows  that  this  equation  is  of  the 
form  \  +  pft  =  0,  where  ft  is  an  algebraic  integer.  Then  ft  =  —  1/p  shows  that 
—  1  /p  is  an  algebraic  integer.  Since  —  1/p  is  in  Q,  Lemma  7.30  shows  that  it 
must  be  in  Z,  and  we  have  arrived  at  a  contradiction.  Thus  there  must  have  been 
some  x  with  Ry  (x)  scalar  for  x  in  C. 

The  set  of  g  in  G  for  which  this  Rx  has  Rx  (g)  scalar  is  a  normal  subgroup  of 
G  that  contains  x  and  cannot  therefore  be  {1}.  Assume  by  way  of  contradiction 
that  G  is  simple.  Then  Rx(g)  is  scalar  for  all  g  in  G.  Since  Rx  is  irreducible, 
Rx  is  1 -dimensional.  Then  the  commutator  subgroup  G'  of  G  is  contained  in  the 
kernel  of  Rx.  Since  Rx  ^  1,  G'  is  not  all  of  G.  Since  G'  is  normal,  G'  =  {1}, 
and  we  conclude  that  G  is  abelian.  But  the  given  G  has  a  conjugacy  class  with 
more  than  one  element,  and  we  have  arrived  at  a  contradiction.  □ 

Proof  of  Theorem  7.32.  Corollary  4.38  shows  that  a  group  of  prime-power 
order  has  a  center  different  from  {1},  and  we  may  therefore  assume  that  p  ^  q, 
a  >  0,  and  b  >  0.  Let  H  be  a  Sylow  ^-subgroup.  Applying  Corollary  4.38, 
let  x  be  a  member  of  the  center  ZH  of  H  other  than  1.  The  centralizer  Zg({x}) 
is  a  subgroup  containing  H ,  and  it  therefore  has  order  pa  qh .  If  a'  =  a,  then 
x  is  in  the  center  of  G,  and  the  powers  of  x  form  the  desired  proper  normal 
subgroup  of  G.  Thus  a'  <  a.  By  Proposition  4.37  the  conjugacy  class  C  of  x  has 
\G\/pa  qh  =  pa~a  elements  with  a  —  a'  >  0.  By  Lemma  7.35,  G  is  not  simple. 

□ 


6.  Extensions  of  Groups 

In  Section  IV.8  we  examined  composition  series  for  finite  groups.  For  a  given 
finite  group,  a  composition  series  consists  of  a  decreasing  sequence  of  subgroups 
starting  with  the  whole  group  and  ending  with  { 1 } ,  each  normal  in  the  next  larger 
one,  such  that  the  successive  quotient  groups  are  simple.  The  Jordan-Holder 
Theorem  (Corollary  4.50)  assured  us  that  the  set  of  successive  quotients,  up  to 
isomorphism,  is  independent  of  the  choice  of  composition  series.  This  theorem 
raises  the  question  of  reconstructing  the  whole  group  from  data  of  this  kind. 
Consider  a  single  step  of  the  process.  If  we  know  the  normal  subgroup  and  the 
simple  quotient  that  it  yields  at  a  certain  stage,  what  are  the  possibilities  for  the 
next-larger  subgroup?  We  study  this  question  and  some  of  its  ramifications  in 
this  section,  dropping  any  hypotheses  that  are  not  helpful  in  the  analysis.  Here  is 
an  example  that  we  shall  carry  along. 
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Example.  Suppose  that  the  normal  subgroup  is  the  cyclic  group  C 4  and  that 
the  quotient  is  the  cyclic  group  C2.  The  whole  group  has  to  be  of  order  8,  and 
the  classification  of  groups  of  order  8  done  in  Problems  39-44  at  the  end  of 
Chapter  IV  tells  us  that  there  are  four  different  possibilities  for  the  whole  group: 
the  abelian  groups  C 4  x  C2  and  Cg,  the  dihedral  group  £>4,  and  the  quaternion 
group  //8. 

Let  us  establish  a  framework  for  the  general  problem.  We  start  with  a  group  E , 
a  normal  subgroup  N,  and  the  quotient  G  =  E/N.  We  seek  data  that  determine 
the  group  law  in  E  in  terms  of  N  and  G.  For  each  member  u  of  G,  fix  a  coset 
representative  u  in  E  such  that  uN  =  u.  Since  N  is  normal,  the  element  u  of  E 
yields  an  automorphism  ( • )“  of  N  defined  by  x"  =  uxu~l.  In  addition,  the  fact 
that  G  is  a  group  says  that  any  two  of  our  representatives  u  and  v  have 

uv  =  a(u,  v)uv  for  some  unique  a{u,  v)  in  N. 

The  set  of  all  elements  a(u,  v )  for  this  choice  of  coset  representatives  is  called  a 
factor  set,  and  E  is  called  a  group  extension  of  N  by  the  group3  G. 

The  automorphisms  and  the  factor  set  constructed  above  have  to  satisfy  two 
compatibility  conditions,  as  follows: 

(i)  (x”)“  =  a(w,  v)xuva(u,  u)_1  because  (x“)”  =  u(xv)u~]  =  uvxv~lu~x 
=  (a(u,  v)uv)x(a(u ,  v)uv)~l  =  a(u,  v)xltva{u,  u)_1, 

(ii)  a(v,  w)“a(u ,  vw)  =  a{u,  v)a(uv,  w)  because  ( uv)w  =  a(u,  v)uvw 
=  a(u ,  v)a(uv,  w)uvw  and  u(yw)  =  iia{v,  w)vuJ  =  a(v,  w)umrw  = 
a(v,  w)lla(u ,  vw)uvw. 

Then  the  multiplication  law  in  E  is  given  in  terms  of  the  automorphisms  and  the 
factor  set  by  the  formula 

(iii)  (xii)(yv)  =  xy“a(u,  v)uv  by  the  computation  (xu)(yv)  =  xyuuv  = 
xy“a(u,  v)uv. 

Conversely,  according  to  the  proposition  below,  such  data  determine  a  group  E 
with  a  normal  subgroup  isomorphic  to  N  and  a  quotient  E /N  isomorphic  to  G. 

Proposition  7.36  (Schreier).  Let  two  groups  N  and  G  be  given,  along  with 
a  family  of  automorphisms  x  x“  of  N  parametrized  by  u  in  G,  as  well  as  a 
function  a  :  G  x  G  -x  N  such  that 

(a)  (x’T  =  a(u,  v)xuva{u,  u)-1  for  all  u  and  v  in  G, 

(b)  a(v,  w)ua(u ,  vw)  =  a(u,  v)a(uv,  w)  for  all  u,  v,  w  in  G. 

Then  the  set  N  x  G  becomes  a  group  E  under  the  multiplication 

(c)  (x,  u)(y,  v)  =  ( xy"a(u ,  v),  uv), 

3 Warning:  Some  authors  say  "group  extension  of  G  by  NT 
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and  this  group  has  a  normal  subgroup  isomorphic  to  N  with  quotient  group 
isomorphic  to  G.  More  particularly,  the  identity  of  E  is  (a(  1,  1)_1,  1),  the  map 
x  i->  (xa(  1 .  1)_1,  I )  of  /V  into  E  is  a  one-one  homomorphism  that  exhibits  N  as  a 
normal  subgroup  of  E,  and  the  map  (x,  u)  i-»-  u  of  E  onto  G  is  a  homomorphism 
that  exhibits  G  as  isomorphic  to  E/N. 

PROOF.  Reverting  to  the  earlier  notation,  let  us  write  xu  in  place  of  (x,  u )  for 
elements  of  E.  Associativity  of  multiplication  follows  from  the  computation 


( xuyv)(zw )  =  ( xylla(u ,  v)uv)zw  by  (c) 

=  xyua(u ,  v)zuva{uv,  w)uvw  by  (c) 

=  xyua(u,  v)zuva{u ,  v)~la(u,  v)a(uv,  w)uvw 
=  xyua(u,  v)zuva{u ,  v)~la(v,  w)ua(u,  vw)uvw  by  (b) 

=  x{yzva{v,  w))“a(u ,  vw)uvw  by  (a) 

=  ( xii)(yzva(v ,  w)vuJ )  by  (c) 

=  ( xit){yvzw )  by  (c). 

The  identity  is  to  be  la(l,  l)-1.  Before  checking  this  assertion,  we  prove  three 
preliminary  identities.  Setting  u  =  v  =  1  in  (a)  and  replacing  x1  by  x  gives4 

x1  =  a(l,  l)xa(l,  1)_1  for  all  xe  N.  (*) 

Setting  v  =  w  =  1  in  (b)  gives  a(  1,  1  )ua(u,  1)  =  a(u,  1  )a(u,  1)  and  hence 

a(l,  1)"  =  a(u,  1)  for  all  u  e  G.  (t) 

Meanwhile,  setting  u  =  v  =  1  in  (b)  gives  a{  1,  w)la(  1,  w)  =  a(l,  l)a(l,  w) 

and  hence  a(  1,  in)1  =  a(l,  1)  for  all  w  e  G.  The  left  side  a(l,  in)1  of  this  last 

equality  is  equal  to  a(l,  l)a(l,  in)a(l,  l)^1  by  (*);  canceling  a(l,  1)  yields 

a(\,  in)  =  a(l,  1)  for  all  in  eG.  (tt) 

Using  these  identities,  we  check  that  a (l,  1 ) — 1 1  is  a  two-sided  identity  by  making 
the  computations 

(xu)(a(  1,  1 )— 1 1 )  =  x(a(l,  l)_1)"o(n,  1  )u  by  (c) 

=  x(a(l,  l)_I)"a(l,  l)"n  by  (f ) 

=  xu 


4The  effect  of  the  automorphism  x  (->  x 1  is  not  necessarily  trivial  since  the  coset  representative 
1  of  1  is  not  assumed  to  be  the  identity.  Thus  we  must  distinguish  between  x1  and  x. 
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and 


(a(l,  1)  1  l)(yu)  =  a(l,  1)  1y1a(l,v)v  by  (c) 

=  ya(l,  l)_1a(l,  v)v  by  (*) 

=  yv  by  (tf). 

Let  us  check  that  a  left  inverse  for  xii  is  a{  1,  n)-1(x“  )  1  w  ~ 1 . 

In  fact, 

(a(l,  1  ,  u)~l(xu  )_1«_1)(xii) 

=  a(l,  l)_1a(n_1,  u)~1(xu  )~lxu  a{u~l,u)  1  by  (c) 

=  a(  1,  1)_1I, 

as  required.  Thus  multiplication  is  associative,  there  is  a  two-sided  identity,  and 
every  element  has  a  left  inverse.  It  follows  that  £  is  a  group. 

The  map  xu  u  of  E  into  G  is  a  homomorphism  by  (c),  and  it  is  certainly 
onto  G.  Its  kernel  is  evidently  the  subgroup  of  all  elements  xa(  1,  I J1  I  in  E. 
Since 


(xa(l,  l)_1l)(ya(l,  l)_1l)  =  xa(  1,  l)_1(ya(l,  l)-1)1^!.  1)1  by  (c) 

=  xa(  1,  l)_1a(l,  l)(ya(l,  1)_1)I  by  (*) 

=  xya{  1,  1)_11, 

the  one-one  map  x  m*-  jca(l,  1)-11  of  N  onto  the  kernel  respects  the  group 
structures  and  is  therefore  an  isomorphism.  In  other  words,  the  embedded  version 
of  N  is  the  kernel.  Being  a  kernel,  it  is  a  normal  subgroup.  □ 

Example,  continued.  Let  N  =  C4  =  {1,  r,  r2,  r3}  and  G  =  C2  =  {1,  u0] 
with  Kq  =  1.  The  group  N  has  two  automorphisms,  the  nontrivial  one  fixing  1 
and  r2  while  interchanging  r  and  r3.  The  automorphism  of  N  from  1  e  G  has  to 
be  trivial,  while  the  automorphism  of  N  from  uo  e  G  can  be  trivial  or  nontrivial. 
In  fact, 

f  trivial  for  E  =  C4  x  C~>  and  E  =  Cg, 

the  automorphism  is  \ 

[  nontrivial  for  E  =  D\  and  E  =  H$. 

In  each  case  the  automorphism  does  not  depend  on  the  choice  of  coset  represen¬ 
tatives.  The  factor  sets  do  depend  on  the  choice  of  representatives,  however.  Let 
us  fix  1  as  the  identity  of  E  and  make  a  particular  choice  of  uq  for  each  E.  Then 
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the  definition  of  factor  set  shows  that  a{  1,  1)  =  o(uq,  1)  =  a(  1,  uq)  =  1,  and 
the  only  part  of  the  factor  set  yet  to  be  determined  is  a(uo, 11  o)-  Let  us  consider 
matters  group  by  group.  For  C4XC2,  we  can  take  Tift  to  be  the  generator  of  the  C 2 
factor;  this  has  square  1,  and  hence  a(u 0,  no)  =  L  For  Cs  =  {1  9,  92, . . . ,  61}, 
let  us  think  of  N  as  embedded  in  E  with  r  =  92.  The  element  no  can  be  any  odd 
power  of  9\  if  we  take  uq  =  9,  then  (no)2  =  92  =  r,  and  hence  a(uo,  no)  =  r. 
For  E  =  D4,  the  example  following  Proposition  7.8  shows  that  we  may  view  the 
elements  as  the  rotations  1 ,  r,r2,r3  and  the  reflections  s,  rs,  r2s,  r3s  for  particular 
choices  of  r  and  s.  We  can  take  7777  to  be  any  of  the  reflections,  and  then  (no  )2  =  1 
and  a(uo,  no)  =  1.  Finally  for  E  =  H%  =  {±1,  ±i,  =tj,  ±k},  let  us  say  that  N 
is  embedded  as  {±1,  ±i}.  Then  no  can  be  any  of  the  four  elements  ±j  and  ±k. 
Each  of  these  has  square  —1,  and  hence  a(u 0,  «o)  =  — 1.  For  the  choices  we 
have  made,  we  therefore  have 


a(uo,  no)  = 


1  for  E  =  C4  x  C 2  and  E  =  D4, 
r  for  £  =  C8 , 

-1  for  E  =  El  ft. 


The  formula  of  Proposition  7.36a  reduces  to  (xv)“  =  xuv  since  N  is  abelian,  and 
it  is  certainly  satisfied.  The  formula  for  Proposition  7.36b  is  a(u,  w)“  a  (it,  vw)  = 
a(u,  v)a(uv ,  w).  This  is  satisfied  for  E  =  C4  x  C2  and  E  =  D4  since  a(  ■ ,  • )  is 
identically  1 .  For  the  other  two  cases  the  values  of  a  ( • ,  • )  lie  in  the  2-element 
subgroup  of  N  that  is  fixed  by  the  nontrivial  automorphism,  and  hence  a{  v,  w ) "  = 
a( v,  w )  in  every  case.  The  formula  to  be  checked  reduces  to  a(v.  w)a(  1 .  1)  = 
a(  1,  l)«(u,  w)  by  (ff)  if  u  =  1,  to  a{  1,  l)a(n,  w)  =  a(  1,  l)a(n,  w)  by  (f)  and 
(tt)  if  v  =  1,  and  to  a(  1,  1  )a(u,  i>)  =  a(u,  v)a(  1,  1)  by  (t)  if  w  =  1.  Thus  all 
that  needs  checking  is  the  case  that  u  =  v  =  w  =  no,  and  then  the  formula  in 
question  reduces  to  a(uo,  no)a(l,  1)  =  a(uo,  uo)a(L  1)  by  (f)  and  (ft). 


Let  us  examine  for  a  particular  extension  the  dependence  of  the  automorphisms 
and  factor  set  on  the  choice  of  coset  representatives.  Returning  to  our  original 
construction,  suppose  that  we  change  the  coset  representatives  of  the  members 
of  G,  associating  a  member  n  to  n  e  G  in  place  of  n.  We  then  obtain  a  new 
automorphism  of  N  corresponding  to  it,  and  we  write  it  as  x  xli  =  TtxTr 1 
instead  of  x  h->  x"  =  uxu~l .  To  quantify  matters,  we  observe  that  u  lies  in  the 
same  coset  of  N  as  does  u.  Thus  u  =  a(u)u  for  some  function  a  :  G  — >  N,  and 
the  function  a  can  be  absolutely  arbitrary.  In  terms  of  this  function  a,  the  two 
automorphisms  are  related  by 

x"  =uxu~l  =  a(u)uxu~la(u)~l  =  a(u)xua(u)~l . 

If  the  factor  set  for  the  system  {«}  of  coset  representatives  is  denoted  by 
{b(u,  i>)},  then  we  have  b(u,  v)ot(uv)uv  =  b(u,  v)uv  =  uv  =  a(u)ua(v)v  = 
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a(u)a(v)ua(u,  v)uv.  Equating  coefficients  of  uv,  we  obtain 
b(u,  n)  =  a(u)a(v)ua(u,  v)a(uv)~1 . 

Accordingly  we  say  that  a  group  extension  of  N  by  G  determined  by  automor¬ 
phisms  x  xu  and  a  factor  set  a  ( u ,  v  )  is  equivalent,  or  isomorphic,  to  a  group 
extension  of  N  by  G  determined  by  automorphisms  x  i— >  x11  and  a  factor  set 
b(u,  v)  if  there  is  a  function  a  :  G  — >■  N  such  that 

xu  =  a(u)xua(u)~l  and  b(u,  v)  =  a(u)a(v)ua(u,  v)a(uv)~l 

for  all  u  and  n  in  G.  It  is  immediate  that  equivalence  of  group  extensions  is  an 
equivalence  relation. 

Proposition  7.37.  Suppose  that  E\  and  E2  are  group  extensions  of  N  by  G 
with  respective  inclusions  z'i  :  N  — >  E\  and  ?2  :  N  — >  Ei  and  with  respective 
quotient  homomorphisms  (p \  :  E\  — >  G  and  <p2  :  Ei  G.  If  there  exists  a  group 
isomorphism  :  E 1  — >■  Ei  such  that  the  two  squares  in  Figure  7.4  commute,  then 
the  two  group  extensions  are  equivalent.  Conversely  if  the  two  group  extensions 
are  equivalent,  then  there  exists  a  group  isomorphism  <b  :  E\  — >■  E2  such  that  the 
two  squares  in  Figure  7.4  commute. 

at  h  r?  1,01  /-> 

N  - >  E 1  - >  G 

4> 

AT  b  17  n  ^ 

N  — — — >  £2  - ^  G 

Figure  7.4.  Equivalent  group  extensions. 

Remarks.  The  commutativity  of  the  squares  is  important.  Just  because  two 
group  extensions  of  N  by  G  are  isomorphic  as  groups  does  not  imply  that  they 
are  equivalent  group  extensions.  An  example  is  given  in  Problem  19  at  the  end 
of  the  chapter. 

PROOF.  For  the  direct  part,  suppose  that  d>  exists.  For  each  u  in  G,  select  [i  in 
E 1  with  (p\  (u)  =  u.  Then  we  can  form  the  extension  data  {.v  \-r  xu }  and  {a(u.  v ) } 
for  E\  relative  to  the  normal  subgroup  i\{N)  and  the  system  {z7  |  u  e  G]  of  coset 
representatives.  When  reinterpreted  in  terms  of  A,  E\ ,  and  G,  these  data  become 
{if1  (jc)  i->-  z'f'G")}  and  {if1(a(w,  n))}. 

Application  of  <h  to  the  coset  / 1  ( /V ) u  yields  i2(N)<t>(u)  since  4>  i\  =  z'2,  and 
d>(n)  is  a  member  of  £2  with  ^(^(n))  =  <Pi(u)  =  u.  Setting  u  =  Ofzi),  we 
see  that  is  the  coset  GIAOzz  of  £( N)  in  £2.  Thus  we  can  determine 
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extension  data  for  £ 2  relative  to  h(N)  and  the  system  ji?  |  m  e  G),  and  we  can 
transform  them  by  if1  to  obtain  data  relative  to  N,  E2,  and  G. 

The  claim  is  that  the  data  relative  to  N,  E2,  and  G  match  those  for  N,E\,  and 
G.  The  automorphisms  of  N  from  £2  are  the  maps  if1(x')  i->  if 1  (x,w*),  where 
x'“  =  ux'u~ 1 .  From  i2  =  0/1  and  the  fact  that  each  of  these  maps  is  one-one, 
we  obtain  z^1  =  / ^ 1 0  ~ 1  on  i2(N).  Substitution  shows  that  the  automorphisms 
of  N  from  £2  are 

* f1  (O- 1(jc/))  h*  z71(4>“1(x,“*))  =  (<&-' (uxTr1)) 

=  z7‘(z -i<S>-\x')u~l)  =  z7i((4>-1(x,))h). 

If  we  set  x'  =  O  (x )  with  x  in  z'i  (N),  then  the  automorphisms  of  N  from  £2  take 
the  form  71  (x)  h->  zj-1  (x“)-  Thus  they  match  the  automorphisms  of  N  from  E\. 

In  the  case  of  the  factor  sets,  we  have  uv  =  a(u,  v)uv.  Application  of  O  gives 
uv  =  O («(//.  v))uv.  Thus  the  factor  set  for  £2  relative  to  N  is  17  1 T («(//.  v ) )} . 
Since  if 1 T  =  z ,  1 ,  this  matches  the  factor  set  for  E\  relative  to  N. 

We  turn  to  the  converse  part.  Suppose  that  the  multiplication  law  in  £1  is 
(z'l (x)u)(i\(y)v)  =  i\{x)i\{y)ui\(a(u,v))uv  for  x  and  y  in  N,  and  that  the 
multiplication  law  in  £2  is  (z2(x)z<)(z2(y)u)  =  h(x)i2(y)u* h(b(u,  v))uv.  Herezz 
and  v  are  preimages  of  u  and  v  under  tpi ,  and  zz  and  v  are  preimages  of  zz  and  v  under 
<p2.  Dehne  automorphisms  of  N  by  xu  =  i^l(J,\{x)u)  and  xu*  =  z'71  (hixY*). 
We  can  then  rewrite  the  multiplication  laws  as 

(z'i (x)zz)(z'i(y)u)  =  i\(xyua(u,  v))uv 
and  (z2(x)w)(z2(y)u)  =  h(xyu  b(u ,  v))uv. 

The  assumption  that  £1  is  equivalent  to  £2  as  an  extension  of  N  by  G  means  that 
there  exists  a  function  a  :  G  — N  such  that 

x“*  =  a(u)xuot(uyl  and  b{u,  v)  =  a(u)a(v)ua(u,  v)a(uv)~1 

for  all  zz  and  u  in  G.  Dehne  :  £1  — >  E2  by 

d>(/'i(x)z/)  =  z'2(xa(zz)_1)zz. 

Certainly  is  one-one  onto.  It  remains  to  check  that  O  is  a  group  homomorphism 
and  that  the  squares  commute  in  Figure  7.4. 

To  check  that  <F  :  £1  — >■  £2  is  a  group  homomorphism,  we  compare 

d>(/'i(x)zzzi(y)i3)  =  <t>(z'i  (xyua(u,  v))uv  =  i2(xyua(u,  v)a(uv)~1)uv 


354 


VII.  Advanced  Group  Theory 


with  the  product 

=  i2{xa{u)~l)ui2(ya{v)~l)v 
=  i2(xa{u)~l (ya(v)~l)u  b{u,  v))uv. 


Since 

a(u)~l(ya(v)~1)u*b(u,  v )  =  a(u)~l (ya(v)~l)u* a(u)a(v)u a(u,  v)a(uv)~1 

=  ( ya{v)~l)u a{v)u a{u ,  v)a(uv)~l 
=  y"a(u,  v)a(uv)~l, 

these  expressions  are  equal,  and  O  is  a  group  homomorphism.  Thus  O  is  a  group 
isomorphism. 

Now  we  check  the  commutativity  of  the  squares.  The  computation 
(f>2®(ii(x)u)  =  (p2(i2(xa(u)~1)u)  =  u  =  q>\(i\(x)u) 

shows  that  the  right-hand  square  commutes. 

For  the  left-hand  square  we  use  the  fact  recorded  in  the  statement  of  Proposition 
7.36  that  z'i(a(l,  1)_1)1  is  the  identity  of  £j  and  i2(b(l.  1)-1)1  is  the  identity  of 
£Y  Therefore  <F/'i(x)  =  <F(/'i(xa(l,  1)-1)T)  =  i2(xa(l,  l)_1a(l)_1)l.  Since 
/2(x)  =  xb(  1,  1)_11,  the  left-hand  square  commutes  if  b(  1,  1)  =  Q!(l)fl(l,  1). 
This  formula  follows  from  (*)  in  the  proof  of  Proposition  7.36  by  the  computation 

b(  1,  1)  =  cz ( 1  )cz ( 1 ) 1  u (1 ,  l)a(l)-1  =  a(l)a(l,  llalDad)-1  =  a(l)a(l,  1), 
and  thus  the  left-hand  square  indeed  commutes.  □ 

For  the  remainder  of  this  section,  let  us  assume  that  N  is  abelian.  In  this 
case  Proposition  7.36a  reduces  to  the  identity  ( xv)u  =  xllv  for  all  u  and  v  in 
G  independently  of  the  choice  of  representatives,  just  as  it  does  in  the  example 
we  studied  with  N  =  C4  and  G  =  C2.  In  the  terminology  of  Section  IV.7,  G 
acts  on  N  by  automorphisms.5  Suppose  we  fix  such  an  action  r  :  G  — >■  Aut  N 
by  automorphisms  and  consider  all  extensions  of  N  by  G  built  from  r.  In  our 
example  we  are  thus  to  consider  E  equal  to  C 4  x  C2  or  C$,  which  are  built  with 
the  trivial  r,  or  else  E  equal  to  £>4  or  H&,  which  are  built  with  the  nontrivial  r  (in 
which  the  nontrivial  element  of  G  acts  by  the  nontrivial  automorphism  of  N). 

Since  N  is  abelian,  let  us  switch  to  additive  notation  for  N  and  to  ordinary 
function  notation  for  r  (w),  rewriting  the  formula  of  Proposition  7.36b  as 

T(u)a(v,  w)  +  a(u,  vw)  =  a(u,  v )  +  a(uv,  w). 

5The  formula  (xv)u  =  xuv  correctly  corresponds  to  a  group  action  with  the  group  on  the  left  as 
in  Section  IV.7. 


6.  Extensions  of  Groups 


355 


This  condition  is  preserved  under  addition  of  factor  sets  as  long  as  r  does  not 
change,  it  is  satisfied  by  the  0  factor  set,  and  the  negative  of  a  factor  set  is  again 
a  factor  set.  Therefore  the  factor  sets  for  this  r  form  an  abelian  group. 

Two  factor  sets  for  this  r  are  equivalent  (in  the  sense  of  yielding  equivalent 
group  extensions)  if  and  only  if  their  difference  is  equivalent  to  0,  and  a{u,  v)  is 
equivalent  to  0  if  and  only  if 

a(u,  v)  =  a{uv)  —  a(u )  —  r (u)a(v) 

for  some  function  a  :  G  — >■  A.  The  set  of  factor  sets  for  this  r  that  are  equivalent 
to  0  is  thus  a  subgroup,6  and  we  arrive  at  the  following  result. 

Proposition  7.38.  Let  G  and  A  be  groups  with  A  abelian,  and  suppose  that 
r  :  G  — »■  Aut  A  is  a  homomorphism.  Then  the  set  of  equivalence  classes  of 
group  extensions  of  A  by  G  corresponding  to  the  action  r  :  G  — Aut  A  is 
parametrized  by  the  quotient  of  the  abelian  group  of  factor  sets  by  the  subgroup 
of  factor  sets  equivalent  to  0. 

The  extension  E  corresponding  to  the  0  factor  set  is  of  special  interest.  In 
this  case  the  multiplication  law  for  the  coset  representatives  is  uv  =  uv  since 
the  member  a(u,  v)  =  0  of  A  is  to  be  interpreted  multiplicatively  in  this  product 
formula.  Consequently  the  map  u  i->-  u  of  G  into  £  is  a  group  homomorphism, 
necessarily  one-one,  and  we  can  regard  G  as  a  subgroup  of  E.  Proposition  4.44 
allows  us  to  conclude  that  E  is  the  semidirect  product  GxtA.  The  multiplication 
law  for  general  elements  of  E,  with  multiplicative  notation  used  for  A,  is 

(xu)(yv)  =  x(r  (u)y)uv. 

It  is  possible  also  to  describe  explicitly  the  extension  one  obtains  from  the 
sum  of  two  factor  sets  corresponding  to  the  same  r,  but  we  leave  this  matter 
to  Problems  20-23  at  the  end  of  the  chapter.  The  operation  on  extensions  that 
corresponds  to  addition  of  factor  sets  in  this  way  is  called  Baer  multiplication. 
What  we  saw  in  the  previous  paragraph  says  that  the  group  identity  under  Baer 
multiplication  is  the  semidirect  product. 

The  two  conditions,  the  compatibility  condition  on  a  factor  set  given  in  Proposi¬ 
tion  7.36b  and  the  condition  with  a  in  it  for  equivalence  to  0,  are  of  a  combinatorial 
type  that  occurs  in  many  contexts  in  mathematics  and  is  captured  by  the  ideas 
of  “homology”  and  “cohomology.”  For  the  current  situation  the  notion  is  that  of 
cohomology  of  groups,  and  we  shall  define  it  now.  The  subject  of  homological 

6One  can  legitimately  ask  whether  an  arbitrary  a  :  G  — >  N  leads  to  a  factor  set  under  the 
definition  a(u,  v)  =  a(uv)  —  z(v)a(u)  —  a(v),  and  one  easily  checks  that  the  answer  is  yes. 
Alternatively,  one  can  refer  to  the  case  n  =  2  in  the  upcoming  Proposition  7.39. 
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algebra,  which  is  developed  in  Chapter  IV  of  Advanced  Algebra,  puts  cohomology 
of  groups  in  a  wider  context  and  explains  some  of  its  mystery. 

We  fix  an  abelian  group  N,  a  group  G,  and  a  group  action  r  of  G  on  N 
by  automorphisms.  It  is  customary  to  suppress  r  in  the  notation  for  the  group 
action,  and  we  shall  follow  that  convention.  For  integers  n  >  0,  one  begins  with 
the  abelian  group  C"(G,  N)  of  77-cochains  of  G  with  coefficients  in  N.  This  is 
defined  by 


C"(G,  N) 


N  if  ?7=0, 

{f-U'LiG^N}  if n  >  0. 


In  words,  C"  (G,  N )  is  the  set  of  all  functions  into  N  from  the  77-fold  direct  product 
of  G  with  itself.  The  coboundary  map  <5„  :  C"(G,  N)  — >  C',+1(G,  N )  is  the 
homomorphism  of  abelian  groups  defined  by 


(Sof)(gi)  =  gif  ~  f 


and  by 


1, . . . ,  g„+i)  =  gl(f(g2,  •  •  • ,  g»+i)) 

n 

+  E  (-l)'/(gl.  •  •  •  ,gi-l,  gigi+U  gi+2,  •  •  •  .  g»+l) 

i= 1 


for  77  >  0.  We  postpone  to  the  end  of  this  section  the  proof  of  the  following  result. 


Proposition  7.39.  SnS„-i  =  0  for  all  n  >  1. 


It  follows  from  Proposition  7.39  that  image  5„_i  C  ker(5„  for  all  77  >  1.  Thus 
if  we  define  abelian  groups  by 

Z”(G,  V)  =  ker<$„, 


Bn(G,  N )  = 


0 

image  <5„_i 


for  77  =  0, 
for  77  >  0, 


then  £"(G,  V)  C  Z"(G,  V)  for  all  77,  and  it  makes  sense  to  define  the  abelian 
groups 


Hn(G,  N)  =  Z"(G,  N)/Bn(G,  N )  for  77  >  0. 


The  elements  of  Z”(G,  N)  are  called  77-cocycles,  the  elements  of  Bn(G,  N)  are 
called  77 -coboundaries,  and  Hn(G,  N )  is  called  the  ?7th  cohomology  group  of  G 
with  coefficients  in  V. 


Examples  in  low  degree. 

Degree  0.  Here  (8of)(u)  =  uf  —  f  with  /  in  N  and  u  in  G.  The  cocycle 
condition  is  that  this  is  0  for  all  u.  Thus  /  is  to  be  fixed  by  G.  We  say  that  an  / 
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fixed  by  G  is  an  invariant  of  the  group  action.  The  space  of  invariants  is  denoted 
by  N(j .  By  convention  above,  we  are  taking  B°(G,  TV)  =  0.  Thus 

H°(G,  TV )  =  Ng. 

Degree  1.  Here  ( <$i/)(m ,  v)  =  u(f(v ))  —  f(uv )  +  f(u)  with  /  a  function 
from  G  to  TV.  The  cocycle  condition  is  that 

f(uv)  =  f(u )  +  u(f(v ))  for  all  u,  v  e  G. 

A  function  /  satisfying  this  condition  is  called  a  crossed  homomorphism  of  G 
into  TV.  A  coboundary  is  a  function  /  :  G  — >■  TV  of  the  form  f{u)  =  (Sqx)(u)  = 
ux  —  x  for  some  x  e  TV.  Then  //'  (G,  TV)  is  the  quotient  of  the  group  of  crossed 
homomorphisms  by  this  subgroup.  In  the  special  case  that  the  action  of  G  on  TV  is 
trivial,  the  crossed  homomorphisms  reduce  to  ordinary  homomorphisms  of  G  into 
TV ,  and  every  coboundary  is  0.  Thus  H]  (G.  TV )  is  the  group  of  homomorphisms 
of  G  into  TV  if  G  acts  trivially  on  TV. 

Degree  2.  Here  /  is  a  function  from  G  x  G  into  TV,  and 

(£>2/)(m,  W,  w )  =  u(f(v,  w))  —  f(uv,  w)  +  f(u,  vw)  —  f(u,  v). 

The  cocycle  condition  is  that 

u (f(v,  w))  +  f(u,  vw)  =  f(uv,  w)  +  f(u,  v)  for  all  u,  v,  w  e  G. 

This  is  the  same  as  the  condition  that  {f(u,v)}  be  a  factor  set  for  extensions  of 
TV  by  G  relative  to  the  given  action  of  G  on  TV  by  automorphisms.  A  coboundary 
is  a  function  /  :  G  x  G  — >■  TV  of  the  form 

f(u,  v)  =  (S0a)(u,  v )  =  u(a(v))  —  a(u v)  +  a(u)  for  some  a  :  G  —>■  TV. 

This  is  the  same  as  the  condition  that  {— f(u,  v ) }  be  a  factor  set  equivalent  to  0. 
Thus  we  can  restate  Proposition  7.38  as  follows. 

Proposition  7.40.  Let  G  and  TV  be  groups  with  TV  abelian,  and  suppose  that 
r  :  G  — »■  Aut  TV  is  a  homomorphism.  Then  the  set  of  equivalence  classes  of 
group  extensions  of  TV  by  G  corresponding  to  the  action  r  :  G  — Aut  TV  is 
parametrized  by  H2(G,  TV). 

Since  group  extensions  have  such  a  nice  interpretation  in  terms  of  cohomology 
groups  TV2,  it  is  reasonable  to  look  for  a  nice  interpretation  for  H 1  as  well.  Indeed, 
H 1  has  an  interpretation  in  terms  of  uniqueness  up  to  inner  isomorphisms  for 
semidirect-product  decompositions.  We  continue  with  the  abelian  group  TV,  a 
group  G,  and  a  group  action  r  of  G  on  TV  by  automorphisms.  A  semidirect  product 
E  =  G  xT  TV  is  an  allowable  extension.  Since  G  embeds  as  a  subgroup  of  E ,  we 
are  given  a  one-one  group  homomorphism  u  i->-  u  of  G  into  E.  The  construction 
at  the  beginning  of  this  section  works  with  the  set  it  of  coset  representatives,  and 
they  have  uv  =  Tii). 
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Suppose  that  the  semidirect  product  can  be  formed  by  a  second  one-one  group 
homomorphism  u  u  of  G  into  E.  If  we  write  u  =  a(u)u  for  a  function 
a  :  G  — »■  N,  then  we  know  from  earlier  in  the  section  that  the  extensions  formed 
from  {ii}  and  from  {«}  are  equivalent.  Because  G  maps  homomorphically  into  E 
for  both  systems,  the  factor  sets  are  0  in  both  cases.  Consequently  the  function  a 
must  satisfy 

a(uv)  —  a(u)  —  r  (u)a(v)  =  0. 

This  is  exactly  the  condition  that  a  :  G  — »■  N  be  a  1-cocyle.  Thus  the  group 
Zl(G,N)  parametrizes  all  ways  that  we  can  embed  G  as  a  complementary 
subgroup  to  N  in  the  semidirect  product  E  =  G  xT  N . 

A  relatively  trivial  way  to  construct  a  one-one  group  homomorphism  u  u 
from  u  *-+  u  is  to  form,  in  the  usual  multiplicative  notation,  u  =  Xq1ux o  for 
some  xo  €  N.  Then  u  =  Xq1ux ol  =  Xq1  (r(u)(xo))u,  and  the  additive  notation 
for  a(u )  has  a( u)  =  t{u){xq)  —  Xo-  Referring  to  our  earlier  computations  in 
degree  1,  we  see  that  a  is  in  the  group  Bl  (G,  N)  of  coboundaries. 

The  conclusion  is  that  Hl(G,  N )  parametrizes  all  ways,  modulo  relatively 
trivial  ways,  that  we  can  embed  G  as  a  complementary  subgroup  to  N  in  the 
semidirect  product  E  =  G  xT  N . 

As  promised,  we  now  return  to  the  proof  of  Proposition  7.39. 

Proof  of  Proposition  7.39.  For  n  =  1,  we  have 

(SiS0f)(u,  v )  =  w(0$o/)(u))  -  (80f)(uv)  +  ( S0f)(u ) 

=  u(vf  -  f)  -  ( uvf  -  /)  +  (uf  -  f)  =  0. 

For  n  >  1 ,  we  begin  with 

(S„8n-if)(gu  gn+ 1)  =  gl((8n-lf)(g2,  g„+i)) 

+  E(-l)'(5„-i/)(gi,  ■  •  • ,  gigi+i,  •  •  • ,  g»+i) 
i= 1 

+  (-l)"+1(5„_i/)(gi,...,g„) 

=  I  +  II  +  III. 


Here 

n 


I  =  glg2(/Cg3,  •  •  ■  ,  kn+l))  +  E  (“!)'  lg\(f(g2,  •  •  •  ,  gigi  +  l,  ■  ■  ■  ,  ^»+0) 

(=2 

+  (-1  )ngl(f(g2,  gn))  =  IA  +  IB  +  IC, 

II  =  -(5„_i/)(gig2,  g3,  ■  ■  ■  ,  gnHEt-iy^n-l /)(#!’  •  •  •  -  gigi+l >  •  ••>  gn+ 1) 


1=2 


=  IIA  +  IIB, 
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HI  =  (-1  )'l+1gl(f(g2,  ....  gn))  +  (-1)"+1(-1)/Ul§2,  ft,  .  .  •  ,  gn) 

+  (-D"+1  E  (-1)7^1,  •  •  •  ,  gigi+l.  ... ,gn ) 

/=2 

+  (-1)n+1(-1)"/(gl, 

=  IIIA  +  IIIB  +  me  +  HID. 

Terms  IIA  and  IIB  decompose  further  as 

IIA  =  ~glg2(f(g3,  ■■■.  gn+l))  +  f(glg2g3,  g4,  ■■■,  gn+ 1) 

-  jZ(-l)‘  +  lf(glg2,  ■■■,  gigi  +  l . gn+ 1)  -  (~l)nf(glg2,  g3,-..,  gn) 

i= 3 

=  IIAa  +  IIAb  +  IIAc  +  IIAd, 

IIB  =  E  (-))‘gl(f(g2,  ....  gigi+l ,  ••  •  ,  gn+ 1)) 

1=2 

+  (-l)2(-l)/(gl^2g3,  g4,--.,  gn+ 1) 
n 

+  E  (-l)'(-l)/(glg2,  •  •  •  ,  gigi+l,  ■  ■  ■,  gn+ 1) 

1=3 

tz  7—2 

+  E  ( —  !)'  E  (-ly/Cgl*  •  •  •  .  £/£/+l>  •  •  •  ,  gigi+l.  •  •  •  ,  gn+l) 

1=2  2=2 

+  E  (-l)'(-l),_1/(gl,  •  •  •  ,  gi-lgigi+1,  •  •  •  ,  gn+l) 

1=3 

72—1 

+  E  (-i)‘(-i)7(gi.  •  •  •, gift+i^+2,  •  •  •  - gii+O 
1=2 

+  E  (-!)'  E  (-l)7_7teli  •••  1  gigi+l,  •••>  gjgj+l,  •••>  gn+x) 

/=2  y=/+2 

+  E  (-i)‘(-i)"/(gi.  •  •  •  -  &g/+i.  •  ••>£») 

1=2 

=  IIB  a  +  IIBb  +  IIBc  +  IIBd  +  IIBe  +  IIBf  +  IIBg  +  IIBh  +  IIBi. 

Inspection  shows  that  we  have  cancellation  between  term  IA  and  term  IIAa,  term 
IB  and  term  IIBa,  term  IC  and  term  IIIA,  term  IIAb  and  term  IIBb,  term  IIAc  and 
term  IIBc,  term  IIAd  and  term  IIIB,  term  IIBd  and  term  IIBg,  term  IIBe  and  term 
IIBf,  term  IIBh  and  term  IIIC,  and  term  IIBi  and  term  IIID.  All  the  terms  cancel, 
and  we  conclude  that  /  =  0.  □ 
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7.  Problems 

1.  Using  Burnside’s  Theorem  and  Problem  34  at  the  end  of  Chapter  IV,  show  that 
60  is  the  smallest  possible  order  of  a  nonabelian  simple  group. 

2.  A  commutator  in  a  group  is  any  element  of  the  form  xyx~1y~1. 

(a)  Prove  that  the  inverse  of  a  commutator  is  a  commutator. 

(b)  Prove  that  any  conjugate  of  a  commutator  is  a  commutator. 

3.  Let  a  and  b  be  elements  of  a  group  G.  Prove  that  the  subgroup  generated  by  a 
and  b  is  the  same  as  the  subgroup  generated  by  bab2  and  bab3 . 

4.  A  subgroup  H  of  a  group  G  is  said  to  be  characteristic  if  it  is  carried  into  itself 
by  every  automorphism  of  G. 

(a)  Prove  that  characteristic  implies  normal. 

(b)  Prove  that  the  center  Zq  of  G  is  a  characteristic  subgroup. 

(c)  Prove  that  the  commutator  subgroup  G'  of  G  is  a  characteristic  subgroup. 

5.  In  the  terminology  of  the  previous  problem,  which  subgroups  of  the  quaternion 
subgroup  //x  are  characteristic? 

6.  Is  every  finite  group  finitely  presented?  Why  or  why  not? 

7.  Let  G  —  SL(2,  R),  and  let  G'  be  the  commutator  subgroup. 

(a)  Prove  that  every  element  ^  J  ^  is  in  G' . 

(b)  Prove  that  G'  —  G. 

(c)  Prove  that  ^  ^  9  ^  is  not  a  commutator  even  though  it  is  in  G' . 

8.  Problem  53  at  the  end  of  Chapter  IV  produced  a  group  G  of  order  27  generated 
by  two  elements  a  and  b  satisfying  a9  —  b3  =  b~laba~ 4  —  1.  Prove  that  G  is 
given  by  generators  and  relations  as 

G  =  [a,b\  a9  ,b3  ,b~laba~A). 

9.  Let  Gn  be  given  by  generators  and  a  single  relation  as 

G„  =  (xi,  yu  ■  ■  ■ ,  xn,  y„\  •  •  ■  xnynx~ly~l). 

Prove  that  Gn/G'n  is  free  abelian  of  rank  2 n,  and  conclude  that  the  groups  G„  are 
mutually  nonisomorphic  as  n  varies.  (Educational  note  related  to  topology:  The 
group  G„  may  be  shown  to  be  the  fundamental  group  of  a  compact  orientable 
2-dimensional  manifold  without  boundary  and  with  n  handles.) 

10.  Prove  that  a  free  group  of  finite  rank  n  cannot  be  generated  by  fewer  than  n 
elements. 
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1 1 .  Let  F  be  the  free  group  on  generators  a.  b,  c.  and  let  H  be  the  subgroup  generated 
by  all  words  of  length  2. 

(a)  Find  coset  representatives  g  such  that  G  is  the  disjoint  union  of  the  cosets 
Hg. 

(b)  Find  a  free  basis  of  H. 

12.  For  the  free  group  on  generators  x  and  y,  prove  that  the  elements  y,  xyx~l , 
x2yx~2,  x3yx~3, . . .  ,  constitute  a  free  basis  of  the  subgroup  that  they  generate. 
Conclude  that  a  free  group  of  rank  2  has  a  free  subgroup  of  infinite  rank. 

13.  Let  G  —  C2  *  C 2.  Prove  that  the  only  quotient  groups  of  G,  up  to  isomorphism, 
are  G  itself.  { 1 },  C 2,  C2  x  C2,  and  the  dihedral  groups  D„  for  n  >  3. 

14.  Prove  that  if  every  irreducible  finite-dimensional  representation  of  a  finite  group 
G  is  1 -dimensional,  then  G  is  abelian. 

15.  Let  G  be  a  finitely  generated  group,  and  let  H  be  a  subgroup  of  finite  index. 
Prove  that  H  is  finitely  generated. 

16.  Let  N  be  an  abelian  group,  let  G  be  a  group,  let  r  be  an  action  of  G  on  A'  by 
automorphisms,  and  let  n  >  0  be  an  integer. 

(a)  Prove  that  if  every  element  of  N  has  finite  order  dividing  an  integer  m ,  then 
every  member  of  Hn(G,  N )  has  finite  order  dividing  m . 

(b)  Suppose  that  G  is  finite  and  that  /  is  an  n -cocycle.  Define  an  (n  —  1  )-cochain 
F  by 

F(gi,  .  .  .  ,  gn- 1)  =  J2  f(gU  •  •  • ,  gn-l,  g). 

g^G 

By  summing  the  cocycle  condition  for  /  over  the  last  variable,  express 
\G\f(gi,...,g„)  in  terms  of  F,  and  deduce  that  \G\f  is  a  coboundary. 
Conclude  that  every  member  of  Hn(G,  N)  has  order  dividing  |G|. 

17.  Let  G  be  a  finite  group.  Suppose  that  G  has  a  normal  abelian  subgroup  N,  and 
suppose  that  GCD(|  /V| ,  \G/N\)  —  1.  Prove  that  there  exists  a  subgroup  H  of  G 
such  that  G  is  the  semidirect  product  of  H  and  N. 

18.  Let  N  be  the  cyclic  group  C2,  and  let  G  be  an  arbitrary  group  of  order  4.  Identify 
up  to  equivalence  all  group  extensions  of  N  by  G . 

19.  Let  N  —  C2,  and  let  E  —  (C2  ©  C4).  Regard  E  as  an  extension  of  N  in 

two  ways— first  by  embedding  N  as  one  of  the  summands  C2  of  E  and  then  by 
embedding  IV  as  a  subgroup  of  one  of  the  summands  C4  of  E.  Show  that  the 
quotient  groups  E /N  in  the  two  cases  are  isomorphic,  that  E / N  acts  trivially  on 
N  in  both  cases,  and  that  the  two  group  extensions  are  not  equivalent. 

Problems  20-23  concern  Baer  multiplication  of  extensions.  Let  N  be  an  abelian 
group,  let  G  be  a  group,  let  r  be  an  action  of  G  on  N  by  automorphisms,  and  let 
E 1  and  Ej  be  two  extensions  of  N  by  G  relative  to  r.  Write  <pi  :  E\  — »•  G  and 
<p2  '■  E2  G  for  the  quotient  mappings.  Let  (E ,  E')  denote  the  subgroup  of  all 
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members  (e\,  ej)  of  E\  x  £2  for  which  q>\(e\)  =  Writing  the  operation  in 

N  multiplicatively,  let  Q  =  { (x ,  x "  1 )  e  E\  x  £2  I  x  e  A'}.  The  Baer  product  of  E\ 
and  £2  is  defined  to  be  the  quotient  (£1 ,  £2) /  Q.  A  typical  coset  of  the  Baer  product 
will  be  denoted  by  (ei ,ej )Q- 

20.  Prove  that  the  homomorphism  x  i—>  (x ,  i)Q  is  one-one  from  N  into  (£1,  £2  )/  Q, 
that  the  homomorphism  (p  :  (E\,  £2)  — >  G  defined  by  cp(e  1,  ^2)  =  ^i(ei)  has 
image  G  and  descends  to  the  quotient  (£1,  Eif/Q ,  and  that  the  kernel  of  the 
descended  <p  is  the  embedded  copy  of  N.  (Therefore  (£1,  £2)/ Q  is  an  extension 
of  N  by  G,  evidently  relative  to  r.) 

21.  For  each  u  e  G,  select  u  e  E\  and  u  e  £2  with  (p\(u)  —  u  —  cp2(u ),  and 
define  a(u,  v )  and  b(u,  i>)  for  u  and  v  in  G  by  ( xu)(yv )  =  a(u,  v)uv  and 
(xu)(yv)  —  b{u,  v)b{u,  v ).  Show  that  (u,  u)Q  has  < pi(u,  u)Q)  —  u  and  that  the 
associated  2-cocyle  for  (£1,  £2)/  Q  is  a(u,  v)b{u,  v)  if  the  group  operation  in  N 
is  written  multiplicatively. 

22.  Prove  that  Baer  multiplication  descends  to  a  well-defined  multiplication  of  equiv¬ 
alence  classes  of  extensions  of  N  by  G  relative  to  r,  in  the  following  sense: 
Suppose  that  E\  and  E\  are  equivalent  extensions  and  that  £2  and  £(  are  equiv¬ 
alent  extensions.  Let  (£  1,  £2)/ Q  and  (E\ ,  E'2)/ Q'  be  the  Baer  products.  Then 
(Ei,  £2)/ Q  is  equivalent  to  (£j,  £() / Q' .  Conclude  that  if  Baer  multiplication 
is  imposed  on  equivalence  classes  of  extensions  of  N  by  G  relative  to  r ,  then  the 
correspondence  stated  in  Proposition  7.40  of  equivalence  classes  to  members  of 
H2(G,  N )  is  a  group  isomorphism. 

Problems  23-24  derive  the  Poisson  summation  formula  for  finite  abelian  groups.  If  G 
is  a  finite  abelian  group  and  G  is  its  group  of  multiplicative  characters,  then  the  Fourier 
coefficient  at  x  e  G  of  a  function  /  in  C(G,  C)  is  fix)  =  5Z?sg  /0>)x0>)-  The 
Fourier  inversion  formula  in  Theorem  7. 17  says  that  fig)  —  |G|-1  fix)x(g)- 

23.  Let  G  be  a  finite  abelian  group,  let  H  be  a  subgroup,  and  let  G/H  be  the  quotient 

group.  If  t  is  in  G,  write  t  for  the  coset  of  t  in  G/H.  Let  /  be  in  C(G,  C) 
and  define  £(r)  =  /(?  +  h)  as  a  function  on  G/H.  Suppose  that  /  is  a 

member  of  G  that  is  identically  1  on  H,  so  that  x  descends  to  a  member  x  of 
G/H.  Prove  that  fix)  —  Fix). 

24.  (Poisson  summation  formula)  With  /  and  £  as  in  the  previous  problem,  apply 
the  Fourier  inversion  formula  for  G/H  to  the  function  £,  and  derive  the  formula 

X> +  *>  =  jG75I  JC  ?«■»<•>«>• 

hsH  X  weG,  m\H=\ 

(Educational  note:  This  formula  is  often  applied  with  t  —  0,  in  which  case  it 
reduces  to  JfhcH  /(*)  =  \gJh\  <B|h=i  /(")■) 
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Problems  25-28  continue  the  introduction  to  error-correcting  codes  begun  in  Problems 
63-73  at  the  end  of  Chapter  IV,  combining  those  results  with  the  Poisson  summation 
formula  in  the  problems  above  and  with  notions  from  Section  VI.  1.  Let  F  be  the  field 
Z/2Z,  and  form  the  Hamming  space  F".  Define  a  nondegenerate  bilinear  form  on  F" 
by  (a,  c)  =  i  ctjCi  for  a  and  c  in  F".  Recall  from  Chapter  IV  that  a  linear  code 
C  is  a  vector  subspace  of  F" .  For  such  a  C,  let  C1  as  in  Section  VI.  1  be  the  set  of  all 
a  e  F”  such  that  (a,  c)  =  0  for  all  c  e  C;  the  linear  code  C1  is  called  the  dual  code. 
A  linear  code  is  self  dual  if  C'1  —  C . 

25.  (a)  Show  that  the  codes  0  and  F"  are  dual  to  each  other. 

(b)  Show  that  the  repetition  code  and  the  parity-check  code  are  dual  to  each 
other. 

(c)  Show  that  the  Hamming  code  of  order  8  is  self  dual. 

(d)  Show  that  any  self-dual  linear  code  C  has  dim  C  —  m/2,  and  conclude  that 
the  Hamming  code  of  order  2r  with  r  >  3  is  not  self  dual. 

(e)  Show  that  any  member  c  of  a  self-dual  linear  code  C  has  even  weight. 

(f)  Show  that  if  a  linear  code  C  has  C  C  C  L  and  if  every  member  c  of  C  has 
even  weight,  then  c  i->  jwt(c)  mod  2  is  a  group  homomorphism  of  C  into 
Z/2Z.  Here  wt(c)  denotes  the  weight  of  c. 

26.  Regard  F”  as  an  additive  group  G  to  which  the  Fourier  inversion  formula  of 
Section  4  can  be  applied. 

(a)  Show  that  one  can  map  G  to  F"  by  /  m-  ax  with  /(c)  =  (— l)(a*,c)  and 
that  the  result  is  a  group  isomorphism.  (Therefore  if  /  is  in  C(F”,  C),  we 
can  henceforth  regard  /  as  a  function  on  F" .) 

(b)  Show  under  the  identification  in  (a)  that  if  /  is  in  C (F'!,  C),  then  f(a )  = 
EcsF»  /(c)(-l)<fl'c)  fora  in F". 

(c)  Suppose  that  the  function  /  e  C(F",C)  is  of  the  special  form  /(c)  = 
ECU  fi(ci)  whenever  c  =  (ci,  . . . ,  c„).  Here  each  f)  is  a  function  on 
the  2-element  group  F.  Prove  that  f  (a)  —  \X!=\  fi(ai )  whenever  a  — 
(a i, . . . ,  a„).  Here  j)  is  given  by  the  formula  of  (b)  for  the  case  n  =  1: 

27.  Fix  two  complex  numbers  x  and  y.  Define  /o  :  F  — »•  C  to  be  the  function 
with  /o(0)  =  x  and  /o(l)  =  y.  Define  /  :  F  — »•  C  to  be  the  function  with 
/(c)  =  UU  fo(ci)  —  ,r”_wt(c)  vwt(c)  where  wt(c)  is  the  weight  of  c. 

(a)  Show  that  /o(0)  =  x  +  y  and  /o(l)  =  x  —  y. 

(b)  Show  that  f(a)  =  (x  +  y)n~wtl-a\x  —  y)wt(a). 

28.  Let  C  be  a  linear  code  in  F” .  Take  G  to  be  the  additive  group  of  F"  and  H  to  be 
the  additive  group  of  C.  Regard  C1-  as  an  additive  group  also. 

(a)  Map  G/H  to  C1-  by  /  ax  with  /(c)  =  (—  l)(a/'c).  Show  that  this 
mapping  is  a  group  isomorphism. 
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(b) 


Applying  the  Poisson  summation  formula  of  Problem  24,  prove  that 


E  m  = 

heC 


1 

i CM 


E  /(«) 

a€Cx 


for  all  /  in  C(F",  C). 

(c)  (MacWilliams  identity)  Let  Wc(X,  Y)  —  J2k=o  Nk(C)Xn~kYk,  where 
Nk(C)  is  the  number  of  members  of  C  with  weight  k,  be  the  weight- 
enumerator  polynomial  of  C,  and  let  Wc±(A,  Y)  be  defined  similarly. 
By  applying  (b)  to  the  function  /  in  the  previous  problem,  prove  that 
Wc(x,  y)  =  |CJ~|  1  1Vci (x  +  y,  x  —  y)  for  each  x  and  y.  Conclude  from 
Corollary  4.32  that  weight-enumerator  polynomials  satisfy  Wc(X,  Y)  — 
\C^\~lWc±(X  +  Y,X-Y). 

(d)  The  polynomials  Wc(X,  Y )  were  seen  in  Chapter  IV  to  be  X"  for  the  0 
code,  (A  +  Y)n  for  the  code  F",  Xn  +  Yn  for  the  repetition  code, 
5  (( X + Y)n  +  ( X  —  Y)n)  for  the  parity-check  code,  and  A8  + 14  A4  Y4  +  T8  for 
the  Hamming  code  of  order  8 .  U sing  relationships  established  in  Problem  25 , 
verify  the  result  of  (c)  for  each  of  these  codes. 

(e)  Suppose  that  C  is  a  self-dual  linear  code.  Applying  (c)  in  this  case,  exhibit 
Wc(X,  Y)  as  being  invariant  under  a  copy  of  the  dihedral  group  of 
order  16.  (Educational  note:  If  the  polynomial  Wq (A,  Y)  is  invariant  also 
under  A  m>-  ;  A,  as  is  true  for  the  Hamming  code  of  order  8,  then  Wq(X,  Y) 
is  invariant  under  the  group  generated  by  Dg  and  this  transformation,  which 
can  be  shown  to  have  order  192.) 


Problems  29-31  concern  an  unexpectedly  fast  method  of  computation  of  Fourier 
coefficients  in  the  context  of  finite  abelian  groups,  particularly  in  the  context  of  cyclic 
groups.  They  show  for  a  cyclic  group  of  order  m  =  pq  that  the  use  of  the  idea 
behind  the  Poisson  summation  formula  of  Problem  24  makes  it  possible  to  compute 
the  Fourier  coefficients  of  a  function  in  about  pq(p  +  q)  steps  rather  than  the  expected 
m 2  =  p2q2  steps.  This  savings  may  be  iterated  in  the  case  of  a  cyclic  group  of  order 
2"  so  that  the  Fourier  coefficients  are  computed  in  about  n2n  steps  rather  than  the 
expected  22''  steps.  An  organized  algorithm  to  implement  this  method  of  computation 
is  known  as  the  fast  Fourier  transform.  Write  the  cyclic  group  Cm  as  the  set 
{0,  1, 2,  ...,  m  —  1}  of  integers  modulo  m  under  addition,  and  let  —  e27Tl/m,  Fork 
in  Cm  define  a  multiplicative  character  of  Cm  by  Xn(k)  —  (f”)*4  The  resulting  m 
multiplicative  characters  satisfy  XnXn1  —  Xn+n',  and  they  exhaust  Cm  since  distinct 
multiplicative  characters  are  orthogonal.  It  will  be  convenient  to  identify  Xn  with 
Xn(  D  =  Cr 

29.  In  the  setting  of  Problem  23,  suppose  that  G  =  C„,  with  m  =  pq:  here  p  and  q 
need  not  be  relatively  prime.  Let  H  —  {0,  q .  2q . . . . ,  ( p—\  )q)  be  the  subgroup 
of  G  isomorphic  to  Cp,  so  that  G/H  =  {0,  1,  2, . . . ,  q  —  1)  is  isomorphic  to 
Cq.  Prove  that  the  characters  x  of  G  identified  with  f,®,  kmP ,  •  •  ■ .  Km 
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are  the  ones  that  are  identically  1  on  H  and  therefore  descend  to  characters 
of  G/H.  Verify  that  the  descended  characters  y  are  the  ones  identified  with 
Kq,  Kq, . . . ,  Kq~l  ■  Consequently  the  formula  /(x)  =  F(y)  of  Problem  23 

provides  a  way  of  computing  /  at  Km,  Km,  Km’ ,  >. . ,  Km  1  )p  from  the  values  of 
F .  Show  that  if  F  is  computed  from  the  definition  of  Fourier  coefficients,  then 
the  number  of  steps  involved  in  its  computation  is  about  q2 ,  apart  from  a  constant 
factor.  Show  therefore  that  the  total  number  of  steps  in  computing  /  at  these 
special  values  of  x  is  therefore  on  the  order  of  q2  +  pq. 

30.  In  the  previous  problem  show  for  each  k  with  0  <  k  <  p—  1  that  the  value  of  f  at 

Kk,  Km+k ,  KmP+k ,  •  •  • ,  Km  1)p+k  can  be  handled  in  the  same  way  with  a  different 
F  by  replacing  /  by  a  suitable  variant  of  /.  Doing  so  for  each  k  requires  p  times 
the  number  of  steps  detected  in  the  previous  problem,  and  therefore  all  of  /  can 
be  computed  in  about  p(q2  +  pq)  —  pq(p  +  q)  steps. 

3 1 .  Show  how  iteration  of  this  process  to  compute  the  Fourier  coefficients  of  each  F, 
together  with  further  iteration  of  this  process,  allows  one  to  compute  the  Fourier 
coefficientsforafunctiononCmim2...mr  inabout«i]W2  •  •  •  mr(m\+m2+-  ■  ■ +mr ) 
steps. 

Problems  32-36  concern  contragredient  representations  and  the  decomposition  of  the 
left  regular  representation  of  a  finite  group  G.  They  make  use  of  Problems  24-28  in 
Chapter  III,  which  introduce  the  complex  conjugate  V  of  a  complex  vector  space  V .  In 
the  case  that  V  is  an  inner-product  space,  those  problems  define  (m,  v)y  —  ( v ,  u)y, 
and  they  show  that  if  lv  e  V'  is  given  by  lv(u)  —  (u,  v)v  —  ( v ,  u)y ,  then  the 
mapping  lv  **  v  is  an  isomorphism  of  V'  with  V. 

32.  Show  that  the  definition  (£Vl  ,eV2)v  =  (t>i,  V2)y  makes  the  isomorphism  of  V' 
with  V  preserve  inner  products. 

33.  If  R  is  a  unitary  representation  of  G  on  the  finite-dimensional  complex  vector 
space  V,  define  the  contragredient  representation  Rc  of  G  on  V’  by  Rr(x)  = 
R(x~lY.  Prove  that  Rc(x)£v  —  £r(x)v  and  that  Rc  is  unitary  on  V'. 

34.  Show  that  the  matrix  coefficients  of  Rc  are  the  complex  conjugates  of  those  of 
R  and  that  the  characters  satisfy  yR,  —  ~y~R- 

35.  Give  an  example  of  an  irreducible  representation  of  a  finite  group  G  that  is  not 
equivalent  to  its  contragredient. 

36.  Let  £  be  the  left  regular  representation  of  G  on  C (G,  C),  and  let  Vr  be  the  linear 
span  in  C(G,  C)  of  the  matrix  coefficients  of  an  irreducible  representation  R  of 
dimension  d.  Prove  that  the  representation  (£,  Vr)  of  G  is  equivalent  to  the  direct 
sum  of  d  copies  of  the  contragredient  Rc. 

Problems  37-46  concern  the  free  product  C2  *  C3  and  its  quotients.  The  problems 
make  use  of  the  group  of  matrices  SL(2,  Z//??Z)  of  determinant  1  over  the  com¬ 
mutative  ring  Z/hzZ,  as  discussed  in  Section  V.2.  One  of  the  quotients  of  C2  *  C3 
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will  be  PSL(2,  Z)  =  SL(2,  Z)/{scalar  matrices},  and  these  problems  show  that  the 
quotient  mapping  can  be  arranged  to  be  an  isomorphism.  Other  quotients  will  be 
the  groups  Gm  —  { X ,  Y ;  X2,  Y?\  (XY)m)  with  m  >  2.  These  arise  in  connection 
with  tilings  in  2-dimensional  geometry.  The  isomorphism  C2  *  C3  =  PSL(2,  Z) 
leads  to  a  homomorphism  that  will  be  called  am  carrying  Gm  onto  PSL(2,  Z/mZ)  = 
SL(2,  Z/mZ)/{scalar  matrices},  the  image  group  being  finite.  The  problems  show 
that  the  homomorphism  am  :  Gm  PSL(2,  Z/mZ)  is  an  isomorphism  for  the  cases 
in  which  Gm  arises  from  spherical  geometry,  namely  for  2  <  m  <  5,  and  that  the 
homomorphism  is  not  an  isomorphism  for  m  —  6,  the  case  in  which  Gm  arises  from 
Euclidean  geometry. 

37.  Show  that  the  elements  ^  ^  j  and  ^  j  generate  SL(2,  Z)  by  arguing  as 
follows:  if  the  subgroup  T  of  SL(2,  Z)  generated  by  these  two  elements  is  not 
SL(2,  Z),  choose  an  element  (^.^j  outside  T  having  max(|fl|,  \b\)  as  small  as 
possible,  and  derive  a  contradiction  by  showing  that  a  suitable  right  multiple  of 
it  by  elements  of  T  is  in  T. 

38.  By  mapping  X  m-  x  =  mod  ±/  and  Y  i->  y  =  mod  ±7, 

produce  a  group  homomorphism  of  C2*C3  =  (X,  Y ;  X2,  K3}  onto  PSL(2,  Z). 

39.  Let  x,  y,  and  3>  :  C2  *  C3  — >  PSL(2,  Z)  be  as  in  the  previous  problem. 

(a)  For  any  member  ^  mod  ±7  of  PSL(2,  Z),  define  ji  mod  ±I^j 

—  max(|fl|,  \b\)  and  v  mod  d=7^  =  max(|c|,  \d\).  Prove  that  if 

2  ^ j  mod  ±7  in  PSL(2,  Z)  has  ab  <  0,  then  n(zyx)  >  ii(z)  and 

jx(zy~xx)  >  jx(z),  while  if  cd  <  0,  then  v(zyx)  >  v(z)  and  v{zy~xx)  > 
v(z). 

(b)  Prove  that  / i(zx )  =  jx{z )  and  v(zx)  —  v(z)  for  all  z  in  PSL(2,  Z). 

(c)  Show  that  there  are  only  10  members  z  of  PSL(2,  Z)  for  which  the  two 
conditions  /x(z)  =  1  and  v(z)  —  1  both  hold. 

(d)  A  reduced  word  in  C2  *  C3  is  a  finite  sequence  of  factors  X ,  T,  and  T-1, 
with  no  two  consecutive  factors  equal  and  with  no  two  consecutive  factors 
YY -1  or  Y~ 1  Y .  Prove  for  any  reduced  word  a  1  ■  ■  ■  a„  in  C 2  *  C 3,  where 
each  cij  is  one  of  X ,  Y,  and  Y~l ,  that  /z(<J>(fli  *  ■  •  an ))  >  /r(0(ai  •  •  •  a;!-i)) 
and  that  v(<t>(ai  •  ■  •  a,,))  >  r>(0(ai  ■  •  •  a„_  1)). 

(e)  Deduce  that  the  homomorphism  T>  is  an  isomorphism. 

40.  Let  T(m)  be  the  group  of  all  matrices  M  in  SL(2,  Z)  such  that  every  entry  of 
M  —  I  is  divisible  by  m . 

(a)  Prove  that  passage  from  a  matrix  in  SL(2,  Z)  to  the  same  matrix  with  its 
entries  considered  modulo  m  gives  a  homomorphism  dm  :  SL(2,  Z)  — »• 
SL(2,  Z/mZ)  with  ker?fm  =  r(«?). 
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(b)  Prove  that  if  a,  ft,  and  m  are  positive  integers  with  GCD(a,  ft.  m)  —  1, 
then  there  exists  an  integer  r  such  that  GCD(a  +  mr,  ft)  =  1.  (One 
way  of  proceeding  is  to  use  Dirichlet’s  theorem  on  primes  in  arithmetic 
progressions.) 

(c)  Prove  that  image  am  —  SL(2,  Z/mZ),  i.e.,  am  is  onto. 

41.  Let  <f>(„  :  C2*C3  — »•  Gm  be  the  homomorphism  defined  by  the  conditions  X  h*  X 
and  Y  h*  Y.  Let  Hm  be  the  smallest  normal  subgroup  of  PSL(2,  Z)  containing 
(xy)m  mod  ±7.  Let  am  :  SL(2,  Z)  — >■  SL(2,  Z//7jZ)  be  the  homomorphism  of 
the  previous  problem. 

(a)  Why  is  <t>m  well  defined? 

(b)  Why  is  Hm  =  0(ker0m)? 

(c)  Define  PSL(Z/mZ)  =  SL(2,  Z/mZ)/{scalar  matrices}.  Why  does  the 
composition  of  <?„,  followed  by  passage  to  the  quotient  descend  to  a  ho¬ 
momorphism  om  of  PSL(2,  Z)  onto  PSL(2,  Z/mZ)? 

(d)  If  K  c  PSL(2,  Z)  is  the  kernel  of  er„, ,  why  is  Hm  C  Kml 

(e)  Show  that  if  t  is  any  integer,  then  the  following  members  of  Km  lie  in  the 

subgroup  Hm:  (  J  )  mod  ±7,  °)  mod  ±7,  (  ^  )  mod  ±7, 

and  (  l+lm  ~,m  )  mod  ±7. 

\  tm  \—tm ) 

42.  With  Gm  defined  as  above,  exhibit  homomorphisms  of  various  groups  Gm  onto 
the  following  finite  groups: 

(a)  03  when  m  —  2  by  sending  I  h-  (1  2)  and  f  h-  (1  2  3). 

(b)  214  when  m  —  3  by  sending  X  i->  (1  2)(3  4)  and  K  i — ^  ( 1  2  3). 

(c)  ©4  when  m  =  4  by  sending  I  h-  (1  2)  and  Y  i->  (2  3  4). 

(d)  2I5  when  m  —  5  by  sending  X  i->  (1  2)(3  4)  and  Y  i->  (1  3  5). 

43.  This  problem  shows  how  to  prove  that  Hm  =  Km  for  2  <  m  <  5,  and  it  asks 
that  the  steps  be  carried  out  for  m  =  2  and  m  =  3.  Recall  from  the  remark 
with  Lemma  7.11  that  Lemma  7.1 1  is  valid  for  all  groups  in  determining  a  set 
of  generators  of  a  subgroup  from  generators  of  the  whole  group  and  a  system  of 
coset  representatives.  The  lemma  is  to  be  applied  to  the  group  PSL(2,  Z)  and 
the  subgroup  Km .  Generators  of  PSL(2,  Z)  are  taken  as  hi  —  x  mod  ±7  and 
To  =  y  mod  ±7. 

(a)  For  the  case  in  =  2,  find  members  gi , . . . ,  g(,  of  PSL(2,  Z)  such  that  the  six 
cosets  of  PSL(2,  T,)/ K2  are  exactly  TGgi.  . . . ,  Kjge- 

(b)  Still  for  the  case  m  —  2,  find  gjbi p(gjbi)~x  for  1  <  i  <  2  and  1  <  j  <  6. 
Lemma  7.1 1  says  that  these  12  elements  generate  Ki¬ 
le)  Using  Problem  41e  and  any  necessary  variations  of  it,  show  that  each  of 

the  12  generators  of  K2  in  (b)  lies  in  the  subgroup  772,  and  conclude  that 

772  =  K2. 
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(d)  Repeat  steps  (a),  (b),  and  (c)  for  m  —  3.  There  are  12  cosets  Kygj  of 
PSL(2,  7j)/Ky.  (Educational  note:  There  are  24  cosets  for  PSL(2,  Zj/AV 
and  60  cosets  for  PSL(2,  h)/Ks.) 

44.  Take  for  granted  that  Hm  =  Km  for  2  <  m  <  5.  Deduce  the  isomorphisms 

(a)  G2  =  PSL(2,  Z/2Z)  =  63. 

(b)  G3  =  PSL(2,  Z/3Z)  =  2I4.  (This  group  is  called  the  tetrahedral  group.) 

(c)  G4  =  PSL(2,  Z/4Z)  =  ©4.  (This  group  is  called  the  octahedral  group.) 

(d)  G5  =  PSL(2,  Z/5Z)  =  2I5.  (This  group  is  called  the  icosahedral  group.) 

45.  A  translation  in  the  Euclidean  plane  R2  is  any  function  T(aj,)(x,y)  — 

(a  +  x,b  +  y),  the  rotation  about  the  origin  clockwise  through  the  angle  0 
is  the  linear  map  Rg  given  by  the  matrix  ^  ^ ,  and  the  rotation  about 

(xo,  yo)  clockwise  through  the  angle  6  is  the  linear  map  given  by  (x,  y)  1— >- 
Rg(x  -xo,y-  yo)  +  (-to,  yo)- 

(a)  Prove  that  RgT(aMRgX  =  TRll(ah). 

(b)  Prove  that  the  union  of  the  set  of  translations  and  all  the  sets  of  rotations 
about  points  of  R2  is  a  group  by  showing  that  it  is  the  semidirect  product 
of  the  subgroup  of  rotations  about  the  origin  and  the  normal  subgroup  of 
translations. 

46.  Fix  a  triangle  T  in  the  Euclidean  plane  with  vertices  arranged  counterclockwise 
at  a,  b,  c  and  with  angles  n /2  at  a,  n/3  at  b ,  and  n/6  at  c.  Let  ra  be  rotation 
clockwise  through  jt  at  a,  rg  be  rotation  clockwise  through  2rc /h  at  b,  and  rc  be 
rotation  counterclockwise  through  jr/3  at  c. 

(a)  Show  that  r 2  =  1,  rl  =  1,  =  1,  and  rc  —  rarb- 

(b)  Show  that  the  member  rbrart,rarb  of  the  group  generated  by  ra  and  rg  is  a 
nontrivial  translation  and  therefore  that  the  generated  group  is  infinite. 

(c)  Conclude  that  GV,  ^  PSL(2,  Z/6Z).  (Educational  note:  If  T  denotes  the 
union  of  T  and  the  reflection  of  T  in  one  of  the  sides  of  T,  it  can  be  shown 
that  the  group  generated  by  ra  and  rg  is  isomorphic  to  GV,  and  tiles  the  plane 
with  copies  of  T .) 

Problems  47-52  establish  a  harmonic  analysis  for  arbitrary  representations  of  finite 
groups  on  complex  vector  spaces,  whether  finite-dimensional  or  infinite-dimensional. 
Let  G  be  a  finite  group,  and  let  V  be  a  complex  vector  space.  For  any  representation 
R  of  G  on  V,  one  defines  R(f)v  =  XlveG  f(x)R(x)v  f°r  /  i*1  C(G,  C)  and  v  in  V, 
just  as  in  the  case  that  V  is  finite-dimensional.  The  same  computation  as  in  Section 
VII.4  shows  that  the  formula  R(fi  *  f2)  =  remains  valid  when  V  is 

infinite-dimensional. 

47.  Let  (R\ ,  Vj )  and  ( Rj,  V2)  be  irreducible  finite-dimensional  representations  of  G 
on  complex  vector  spaces,  and  let  %R  and  y  Rt  t>e  their  characters.  Using  Schur 
orthogonality,  prove  that 

(a)  xA>  *  yK  =  0  if  Aj  and  R2  are  inequivalent, 
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(b)  XRl  *XRl  =  \G\dR^XRl>  where  dRi  =  dim  VR. 

48.  With  ( R ,  V )  given,  let  ( Ra .  Va  )  be  any  irreducible  finite-dimensional  represen¬ 
tation  of  G,  and  define  Ea  :  V  — >  V  by  Ea  —  \G\~ldaR(x^),  where  xa  is  the 
character  of  Ra  and  where  da  =  dim  Va . 

(a)  Prove  that  E l  —  Ea. 

(b)  Prove  that  EaEp  =  EpEa  =  0  if  (Rp,  Vp)  is  an  irreducible  finite¬ 
dimensional  representation  of  G  such  that  Ra  and  Rp  are  inequivalent. 

49.  Observe  for  each  v  in  V  that  {R(x)v  \  x  e  G}  spans  a  finite-dimensional  invariant 
subspace  of  V .  By  Corollary  7.21 ,  each  v  in  V  lies  in  a  finite  direct  sum  of  finite¬ 
dimensional  invariant  subspaces  of  V  on  each  of  which  R  acts  irreducibly.  Using 
Zorn’s  Lemma,  prove  that  V  is  the  direct  sum  of  finite-dimensional  subspaces 
on  each  of  which  R  acts  irreducibly.  (If  V  is  infinite-dimensional,  there  will  of 
course  be  infinitely  many  such  subspaces.) 

50.  Suppose  that  Vo  is  a  finite-dimensional  invariant  subspace  of  V  such  that  R  |  ^ 
is  equivalent  to  some  Ra ,  where  Ra  is  as  in  Problem  48.  Prove  that  Ea  is  the 
identity  on  Vo. 

51.  Deduce  that  if  {(Rp,  Vp)}  is  a  maximal  collection  of  inequivalent  finite¬ 
dimensional  irreducible  representations  of  G,  then  Ep  =  I  on  V  and  the 
image  of  Ea  is  the  set  of  all  sums  of  vectors  in  V  lying  in  some  finite-dimensional 
invariant  subspace  Vo  of  V  such  that  R  |  ^  is  equivalent  to  Ra.  (Educational  note: 
Consequently  V  is  exhibited  as  the  finite  direct  sum  of  the  spaces  image  Ea , 
each  space  image  Ea  is  the  direct  sum  of  finite-dimensional  irreducible  invariant 
subspaces,  and  the  restriction  of  R  to  any  finite-dimensional  irreducible  invariant 
subspace  of  image  Ea  is  equivalent  with  Ra. 

52.  Suppose  that  ( Ra ,  Va)  is  a  1 -dimensional  representation  of  G  given  by  a  multi¬ 
plicative  character  o>.  Prove  that  the  image  of  Ea  consists  of  all  vectors  v  in  V 
such  that  R(x)  v  —  co(x)v  for  all  x  in  G. 


CHAPTER  VIII 


Commutative  Rings  and  Their  Modules 


Abstract.  This  chapter  amplifies  the  theory  of  commutative  rings  that  was  begun  in  Chapter  IV, 
and  it  introduces  modules  for  any  ring.  Emphasis  is  on  the  topic  of  unique  factorization. 

Section  1  gives  many  examples  of  rings,  some  commutative  and  some  noncommutative,  and 
introduces  the  notion  of  a  module  for  a  ring. 

Sections  2-A  discuss  some  of  the  tools  related  to  questions  of  factorization  in  integral  domains. 
Section  2  defines  the  field  of  fractions  for  an  integral  domain  and  gives  its  universal  mapping  property. 
Section  3  defines  prime  and  maximal  ideals  and  relates  quotients  of  them  to  integral  domains  and 
fields.  Section  4  introduces  principal  ideal  domains,  which  are  shown  to  have  unique  factorization, 
and  it  defines  Euclidean  domains  as  a  special  kind  of  principal  ideal  domain  for  which  greatest 
common  divisors  can  be  obtained  constructively. 

Section  5  proves  that  if  R  is  an  integral  domain  with  unique  factorization,  then  so  is  the  polynomial 
ring  This  result  is  a  consequence  of  Gauss's  Lemma,  which  addresses  what  happens  to  the 

greatest  common  divisor  of  the  coefficients  when  one  multiplies  two  members  of  R[X].  Gauss’s 
Lemma  has  several  other  consequences  that  relate  factorization  in  K[X]  to  factorization  in  F[X J, 
where  F  is  the  field  of  fractions  of  R.  Still  another  consequence  is  Eisenstein’s  irreducibility  criterion, 
which  gives  a  sufficient  condition  for  a  member  of  S[X]  to  be  irreducible. 

Section  6  contains  the  theorem  that  every  finitely  generated  unital  module  over  a  principal  ideal 
domain  is  a  direct  sum  of  cyclic  modules.  The  cyclic  modules  may  be  assumed  to  be  primary  in  a 
suitable  sense,  and  then  the  isomorphism  types  of  the  modules  appearing  in  the  direct-sum  decom¬ 
position,  together  with  their  multiplicities,  are  uniquely  determined.  The  main  results  transparently 
generalize  the  Fundamental  Theorem  for  Finitely  Generated  Abelian  Groups,  and  less  transparently 
they  generalize  the  existence  and  uniqueness  of  Jordan  canonical  form  for  square  matrices  with 
entries  in  an  algebraically  closed  field. 

Sections  7-1 1  contain  foundational  material  related  to  factorization  for  the  two  subjects  of 
algebraic  number  theory  and  algebraic  geometry.  Both  these  subjects  rely  heavily  on  the  theory  of 
commutative  rings.  Section  7  is  a  section  of  motivation,  showing  the  analogy  between  a  situation 
in  algebraic  number  theory  and  a  situation  in  algebraic  geometry.  Sections  8-10  introduce  Noe- 
therian  rings,  integral  closures,  and  localizations.  Section  1 1  uses  this  material  to  establish  unique 
factorization  of  ideals  for  Dedekind  domains,  as  well  as  some  other  properties. 


1.  Examples  of  Rings  and  Modules 

Sections  4-5  of  Chapter  IV  introduced  rings  and  fields,  giving  a  small  number  of 
examples  of  each.  In  the  present  section  we  begin  by  recalling  those  examples 
and  giving  further  ones.  Although  Chapters  VI  and  VII  are  not  prerequisite  for 


370 


1 .  Examples  of  Rings  and  Modules 


371 


the  present  chapter,  our  list  of  examples  will  include  some  rings  and  fields  that 
arose  in  those  two  chapters. 

The  theory  to  be  developed  in  this  chapter  is  intended  to  apply  to  commutative 
rings,  especially  to  questions  related  to  unique  factorization  in  such  rings.  Despite 
this  limitation  it  seems  wise  to  include  examples  of  noncommutative  rings  in  the 
list  below. 

In  the  conventions  of  this  book,  a  ring  need  not  have  an  identity.  Many  rings 
that  arise  only  in  the  subject  of  algebra  have  an  identity,  but  there  are  important 
rings  in  the  subject  of  real  analysis  that  do  not.  From  the  point  of  view  of  category 
theory,  one  therefore  distinguishes  between  the  category  of  all  rings,  with  ring 
homomorphisms  as  morphisms,  and  the  category  of  all  rings  with  identity,  with 
ring  homomorphisms  carrying  1  to  1  as  morphisms.  In  the  latter  case  one  may 
want  to  exclude  the  zero  ring  from  being  an  object  in  the  category  under  certain 
circumstances. 

Examples  of  rings. 

(1)  Basic  commutative  rings  from  Chapter  IV.  All  of  the  structures  Z,  Q,  M, 
C,  Z/mZ,  and  2Z  are  commutative  rings.  All  but  the  last  have  an  identity.  Of 
these,  Q,  M,  and  C  are  fields,  and  so  is  =  Z/ p7L  if  p  is  a  prime  number.  The 
others  are  not  fields. 

(2)  Polynomial  rings.  Let  R  be  a  nonzero  commutative  ring  with  identity. 
In  Section  IV.5  we  defined  the  commutative  ring  /?[Xi, . . . ,  Xn]  of  polynomials 
over  R  in  n  indeterminate s.  It  has  a  universal  mapping  property  with  respect  to 
substitution  for  the  indeterminates  and  use  of  a  homomorphism  on  the  coefficients. 
Making  substitutions  from  R  itself  and  mapping  the  coefficients  by  the  identity  ho¬ 
momorphism,  we  are  led  to  the  ring  of  all  functions  (n , . . . ,  r„)  h-»  f(r\ , . . . ,  r„) 
for  /],...,  rn  in  R  and  f(X\ , . . . ,  Xn)  in  ,  . . . ,  Xn\\  this  is  called  the  ring 
of  all  polynomial  functions  in  n  variables  on  R.  Polynomials  may  be  considered 
also  in  infinitely  many  variables,  but  we  did  not  treat  this  case  in  any  detail. 

(3)  Matrix  rings  over  commutative  rings.  Let  R  be  a  nonzero  commutative 

ring  with  identity.  The  set  M„{R)  of  all  n-by-n  matrices  with  entries  in  R  is  a  ring 
under  entry-by-entry  addition  and  the  usual  definition  of  matrix  multiplication: 
( AB)jj  =  Ylk=i  It  has  an  identity,  namely  the  identity  matrix  1  with 

Iij  =  8jj.  In  this  setting.  Section  V.2  introduced  a  theory  of  determinants,  and  it 
was  proved  that  a  matrix  has  a  one-sided  inverse  if  and  only  if  it  has  a  two-sided 
inverse,  if  and  only  if  its  determinant  is  a  member  of  the  group  Rx  of  units  in 
R ,  i.e.,  elements  of  R  invertible  under  multiplication.  The  matrix  ring  Mn(R)  is 
always  noncommutative  if  n  >  1 . 

(4)  Matrix  rings  over  noncommutative  rings.  If  R  is  any  ring,  we  can  still  make 
the  set  Mn(R)  of  all  n-by-n  matrices  with  entries  in  R  into  a  ring.  However,  if 
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R  has  no  identity,  Mn{R)  will  have  no  identity.  The  theory  of  determinants  does 
not  directly  apply  if  R  is  noncommutative  or  if  R  fails  to  have  an  identity,1  and  as 
a  consequence,  questions  about  the  invertibility  of  matrices  are  more  subtle  than 
with  the  previous  example. 

(5)  Spaces  of  linear  maps  from  a  vector  space  into  itself.  Let  V  be  a  vector 
space  over  a  field  K.  The  vector  space  Endr(V)  =  Hornet  V\  V )  of  all  K  linear 
maps  from  V  to  itself  is  initially  a  vector  space  over  K.  Composition  provides  a 
multiplication  that  makes  Endjc(  V)  into  a  ring  with  identity.  In  fact,  associativity 
of  multiplication  is  automatic  for  any  kind  of  function,  and  so  is  the  distributive  law 
(Li  +L2)L$  =  L1L3  +  L2L3.  The  distributive  law  L]  (L2  +  L3)  =  L1L2  +  L1L3 
follows  from  the  fact  that  L\  is  linear.  This  ring  is  isomorphic  as  a  ring  to  Mn  (K) 
if  V  is  //-dimensional,  an  isomorphism  being  determined  by  specifying  an  ordered 
basis  of  V. 

(6)  Associative  algebras  over  fields.  These  were  defined  in  Section  VI. 7, 
knowledge  of  which  is  not  being  assumed  now.  Thus  we  repeat  the  definition.  If 
K  is  a  field,  then  an  associative  algebra  over  K,  or  associative  IK  algebra,  is  a  ring 
A  that  is  also  a  vector  space  over  IK  such  that  the  multiplication  A  x  A  — >■  A  is 
K-linear  in  each  variable.  The  conditions  of  linearity  concerning  multiplication 
have  two  parts  to  them:  an  additive  part  saying  that  the  usual  distributive  laws 
are  valid  and  a  scalar-multiplication  part  saying  that 

(, ka)b  =  k(ab)  =  a(kb)  for  all  k  in  K  and  a,  b  in  A. 

If  A  has  an  identity,  the  displayed  condition  says  that  all  scalar  multiples  of 
the  identity  lie  in  the  center  of  A,  i.e.,  commute  with  every  element  of  A.  In 
Examples  2  and  3,  when  R  is  a  field  K,  the  polynomial  rings  and  matrix  rings 
over  IK  provide  examples  of  associative  algebras  over  K;  scalar  multiplication  is 
to  be  done  in  entry-by-entry  fashion.  Example  5  is  an  associative  algebra  as  well. 
If  L  is  any  field  such  that  IK  is  a  subfield,  then  L  may  be  regarded  as  an  associative 
algebra  over  K.  An  interesting  commutative  associative  algebra  over  C  without 
identity  is  the  algebra  CCOm(R)  of  all  continuous  comp  lex- valued  functions  on  R 
that  vanish  outside  a  bounded  interval;  the  vector-space  operations  are  the  usual 
pointwise  operations,  and  the  operation  of  multiplication  is  given  by  convolution 

(/  *  g)(x)  =  f  fix-  y)g(y)  dy. 

JR 

Section  VII. 4  worked  with  an  analog  C(G,  C)  of  this  algebra  in  the  context  that 
R  is  replaced  by  a  finite  group  G. 

1 A  limited  theory  of  determinants  applies  in  the  noncommutative  case,  but  it  will  not  be  helpful 
for  our  purposes. 
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(7 )  Division  rings.  A  division  ring  is  a  nonzero  ring  with  identity  such  that  every 
element  has  a  two-sided  inverse  under  multiplication.  A  commutative  division 
ring  is  just  a  field.  The  ring  H  of  quaternions  is  the  only  explicit  noncommutative 
division  ring  that  we  have  encountered  so  far.  It  is  an  associative  algebra  over  R. 
More  generally,  if  A  is  a  division  ring,  then  we  can  easily  check  that  the  center 
K  of  A  is  a  field  and  that  A  is  an  associative  algebra  over  K.2 

(8)  Tensor,  symmetric,  and  exterior  algebras.  If  £  is  a  vector  space  over  a  field 
K,  Chapter  VI  defined  the  tensor,  symmetric,  and  exterior  algebras  of  E  over  K,  as 
well  as  the  polynomial  algebra  on  E  in  the  case  that  E  is  finite-dimensional.  These 
are  all  associative  algebras  with  identity.  Symmetric  algebras  and  polynomial 
algebras  are  commutative.  None  of  these  algebras  will  be  discussed  further  in 
this  chapter. 

(9)  A  field  of  4  elements.  This  was  constructed  in  Section  IV.4.  Further  finite 
fields  beyond  the  field  of  4  elements  and  the  fields  Fp  =  Z/pZ  with  p  prime  will 
be  constructed  in  Chapter  IX. 

(10)  Algebraic  number  fields  Q[0],  These  were  discussed  in  Sections  IV.  1 
and  IV.4.  In  defining  Q[0],  we  assume  that  0  is  a  complex  number  and  that 
there  exists  an  integer  n  >  0  such  that  the  complex  numbers  1,  9,  92, . . . ,  9" 
are  linearly  dependent  over  Q.  The  set  Q[0]  is  defined  to  be  the  subset  of  C 
obtained  by  substitution  of  9  into  all  members  of  Q[X].  It  coincides  with  the 
linear  span  over  Q  of  1 , 9,  92, . . . ,  9"~l .  Proposition  4. 1  shows  that  it  is  closed 
under  the  arithmetic  operations,  including  passage  to  multiplicative  inverses  of 
nonzero  elements,  and  it  is  therefore  a  subfield  of  C.  This  example  ties  in  with 
the  notion  of  minimal  polynomial  in  Chapter  V  because  the  members  of  Q[X] 
with  9  as  a  root  are  all  multiples  of  one  nonzero  such  polynomial  that  exhibits  the 
linear  dependence.  We  return  to  this  example  occasionally  later  in  this  chapter, 
particularly  in  Sections  7-11,  and  then  we  treat  it  in  more  detail  in  Chapter  IX. 

(11)  Algebraic  integers  in  a  number  field  Q  [6  ] .  Algebraic  integers  were  defined 
in  Section  VII. 4  as  the  roots  in  C  of  monic  polynomials  in  Z[X],  and  they  were 
shown  to  form  a  commutative  ring  with  identity.  The  set  of  algebraic  integers 
in  Q[0]  is  therefore  a  commutative  ring  with  identity,  and  it  plays  somewhat 
the  same  role  for  Q[0]  that  Z  plays  for  Q.  We  discuss  this  example  further  in 
Sections  7-11. 

(12)  Integral  group  rings.  If  G  is  a  group,  then  we  can  make  the  free  abelian 
group  ZG  on  the  elements  of  G  into  a  ring  by  defining  multiplication  to  be 
( J2i  miSi )  ( J2j  njhj)  =  J2i,j  (minj)(gihj)  when  the  m,  and  itj  are  in  Z  and  the 
gi  and  hj  are  in  G.  It  is  immediate  that  the  result  is  a  ring  with  identity,  and  ZG 

2Use  of  the  term  "division  algebra"  requires  some  care.  Some  mathematicians  understand 
division  algebras  to  be  associative,  and  others  do  not.  The  real  algebra  O  of  octonions,  as  defined  in 
Problems  52-56  at  the  end  of  Chapter  VI,  is  not  associative,  but  it  does  have  division. 
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is  called  the  integral  group  ring  of  G.  The  group  G  is  embedded  as  a  subgroup 
of  the  group  (ZG)X  of  units  of  ZG,  each  element  of  g  being  identified  with  a 
sum  i(g)  =  Ylmigi  'n  which  the  only  nonzero  term  is  1  g.  The  ring  ZG  has 
the  universal  mapping  property  illustrated  in  Figure  8. 1  and  described  as  follows: 
whenever  tp  :  G  -»  R  is  a  group  homomorphism  of  G  into  the  group  R  x  of  units 
of  a  ring  R,  then  there  exists  a  unique  ring  homomorphism  <t>  :  ZG  — >  R  such 
that  i  =  <p.  The  existence  of  O  as  a  homomorphism  of  additive  groups  follows 
from  the  universal  mapping  property  of  free  abelian  groups,  and  then  one  readily 
checks  that  <f>  respects  multiplication.3 

G  — ^  R 

ri 

i  /  $ 

ZG 

Figure  8.1.  Universal  mapping  property  of  the  integral  group  ring  of  G. 

(13)  Quotient  rings.  If  R  is  a  ring  and  1  is  a  two-sided  ideal,  then  we  saw  in 
Section  IV.4  that  the  additive  quotient  R/I  has  a  natural  multiplication  that  makes 
it  into  a  ring  called  a  quotient  ring  of  R.  This  in  effect  was  the  construction  that 
obtained  the  ring  Z/zwZ  from  the  ring  Z. 

(14)  Direct  product  of  rings.  If  { Rs  |  s  e  .S')  is  a  nonempty  set  of  rings,  then 
a  direct  product  ]~[jsS  's  a  ring  whose  additive  group  is  any  direct  product 
of  the  underlying  additive  groups  and  whose  ring  operations  are  given  in  entry- 
by-entry  fashion.  The  resulting  ring  and  the  associated  ring  homomorphisms 
Ps0  :  J~[sgS  Rs  — >  RS0  amount  to  the  product  functor  for  the  category  of  rings; 
if  each  Rs  has  an  identity,  the  result  amounts  also  to  the  product  functor  for  the 
category  of  rings  with  identity. 

We  give  further  examples  of  rings  near  the  end  of  this  section  after  we  have 
defined  modules  and  given  some  examples. 

Informally  a  module  is  a  vector  space  over  a  ring.  But  let  us  be  more  precise. 
If  R  is  a  ring,  then  a  left  R  module4  M  is  an  abelian  group  with  the  additional 
structure  of  a  “scalar  multiplication”  R  x  M  — »  M  such  that 

(i)  r(r'm)  =  (rr')m  for  r  and  r'  in  R  and  m  in  M, 

3  Universal  mapping  properties  are  discussed  systematically  in  Problems  18-22  at  the  end  of 
Chapter  VI.  The  subject  of  such  a  property,  here  the  pair  (ZG,  i),  is  always  unique  up  to  canonical 
isomorphism  in  a  given  category,  but  its  existence  has  to  be  proved. 

4Many  algebra  books  write  "R-module,”  using  a  hyphen.  However,  when  R  is  replaced  by  an 
expression,  particularly  in  applications  of  the  theory,  the  hyphen  is  often  dropped.  For  an  example, 
see  "module”  in  Hall’s  The  Theory  of  Groups.  The  present  book  omits  the  hyphen  in  all  cases  in 
order  to  be  consistent. 
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(ii)  (r  +  r')m  =  rm  +  r'm  and  r(m  +  in')  =  rm  +  rm'  if  r  and  r  are  in  R 
and  m  and  m'  are  in  M. 

In  addition,  if  R  has  an  identity,  we  say  that  M  is  unital  if 

(iii)  I  in  =  m  for  all  m  in  M. 

One  may  also  speak  of  right  R  modules.  For  these  the  scalar  multiplication  is 
usually  written  as  mr  with  m  in  M  and  r  in  R,  and  the  expected  analogs  of  (i) 
and  (ii)  are  to  hold. 

When  R  is  commutative,  it  is  immaterial  which  side  is  used  for  the  scalar 
multiplication,  and  one  speaks  simply  of  an  R  module. 

Let  R  be  a  ring,  and  let  M  and  N  be  two  left  R  modules.  A  homomorphism 
of  left  R  modules,  or  more  briefly  an  R  homomorphism,  is  an  additive  group 
homomorphism  <p  :  M  — >  N  such  that  <p{rm)  =  rep  (in)  for  all  r  in  R.  Then  we 
can  form  a  category  for  fixed  R  in  which  the  objects  are  the  left  R  modules  and 
the  morphisms  are  the  R  homomorphisms  from  one  left  R  module  to  another. 
Similarly  the  right  R  modules,  along  with  the  corresponding  kind  of  R  homo¬ 
morphisms,  form  a  category.  If  R  has  an  identity,  then  the  unital  R  modules  form 
a  subcategory  in  each  case.  These  categories  are  fundamental  to  the  subject  of 
homological  algebra,  which  we  take  up  in  Chapter  IV  of  Advanced  Algebra. 

Examples  of  modules. 

(1)  Vector  spaces.  If  A'  is  a  field,  the  unital  R  modules  are  exactly  the  vector 
spaces  over  R. 

(2)  Abelian  groups.  The  unital  Z  modules  are  exactly  the  abelian  groups. 
Scalar  multiplication  is  given  in  the  expected  way:  If  n  is  a  positive  integer,  the 
product  nx  is  the  n-fold  sum  of  x  with  itself.  If  n  =  0,  the  product  nx  is  0.  If 
n  <  0,  the  product  nx  is  —((—n)x). 

(3)  Vector  spaces  as  unital  modules  for  the  polynomial  ring  K[X].  Let  V 
be  a  finite-dimensional  vector  space  over  the  field  K,  and  fix  L  be  in  End~A  V). 
Then  V  becomes  a  unital  K[A]  module  under  the  definition  A(X)v  =  A(L)(v) 
whenever  A(X)  is  a  polynomial  in  K[X];  here  A(L)  is  the  member  of  Endjj( V) 
defined  as  in  Section  V.3.  In  Section  6  in  this  chapter  we  shall  see  that  some  of 
the  deeper  results  in  the  theory  of  a  single  linear  transformation,  as  developed  in 
Chapter  V,  follow  from  the  theory  of  unital  K[X]  modules  that  will  emerge  from 
the  present  chapter. 

(4)  Modules  in  the  context  of  algebraic  number  fields.  Let  Q[0]  be  a  subfield 
of  C  as  in  Example  10  of  rings  earlier  in  this  section.  It  is  assumed  that  the  Q 
vector  space  Q[0]  is  finite-dimensional.  Let  L  be  the  member  of  EndqTQjd  ]) 
given  as  left  multiplication  by  0  on  Q[0],  As  in  the  previous  example,  Q[0] 
becomes  a  unital  Q[X]  module.  Chapter  V  defines  a  minimal  polynomial  for 
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L,  as  well  as  a  characteristic  polynomial.  These  objects  play  a  role  in  the  study 
to  be  carried  out  in  Chapter  IX  of  fields  like  Q[0],  If  0  is  an  algebraic  integer 
as  in  Example  1 1  of  rings  earlier  in  this  section,  then  we  can  get  more  refined 
information  by  replacing  Q  by  Z  in  the  above  analysis;  this  technique  plays  a  role 
in  the  theory  to  be  developed  in  Sections  7—11. 

(5)  Rings  and  their  quotients.  If  R  is  a  ring,  then  R  is  a  left  R  module  and  also 
a  right  R  module.  If  /  is  a  two-sided  ideal  in  /,  then  the  quotient  ring  R/I,  as 
defined  in  Proposition  4.20,  is  a  left  R  module  and  also  a  right  R  module.  These 
modules  are  automatically  unital  if  R  has  an  identity.  Later  in  this  section  we 
shall  consider  quotients  of  R  by  ‘‘one-sided  ideals.” 

(6)  Spaces  of  rectangular  matrices.  If  R  is  a  ring,  then  the  space  Mmn(R )  of 

7?i  -by-77  matrices  with  entries  in  R  is  an  abelian  group  under  addition  and  becomes 
a  left  R  module  when  multiplication  by  the  scalar  r  is  defined  as  left  multiplication 
by  7-  in  each  entry.  Also,  if  we  put  S  =  Mm  (R).  then  Mmn(R)  is  a  left  S  module 
under  the  usual  definition  of  matrix  multiplication:  (sv)ij  =  sikVkj >  where 

s  is  in  S  and  v  is  in  Mmn(R). 

(7)  Direct  product  of  R  modules.  If  S  is  a  nonempty  set  and  { /V/v }  v e  5  is 
a  corresponding  system  of  left  R  modules,  then  a  direct  product  n?es  Ms  is 
obtained  as  an  additive  group  by  forming  any  direct  product  of  the  underlying 
additive  groups  of  the  Ms' s  and  defining  scalar  multiplication  by  members  of 
R  to  be  scalar  multiplication  in  each  coordinate.  The  associated  abelian-group 
homomorphisms  pSo  :  [~[seS.  Ms  — >■  MSo  become  R  homomorphisms  under  this 
definition  of  scalar  multiplication  on  the  direct  product.  Direct  product  amounts 
to  the  product  functor  for  the  category  of  left  R  modules;  we  omit  the  easy 
verification,  which  makes  use  of  the  corresponding  fact  about  abelian  groups.  As 
in  the  case  of  abelian  groups,  we  can  speak  of  an  external  direct  product  as  the 
result  of  a  construction  that  starts  with  the  product  of  the  sets  Ms,  and  we  can 
speak  of  recognizing  a  direct  product  as  internal  when  the  Ms ’s  are  contained  in 
the  direct  product  and  the  restriction  of  each  ps  to  Ms  is  the  identity  function. 

(8)  Direct  sum  of  R  modules.  If  5  is  a  nonempty  set  and  {Mv}ves  is  a  corre¬ 
sponding  system  of  left  R  modules,  then  a  direct  sum  0ss5  Ms  is  obtained  as 
an  additive  group  by  forming  any  direct  sum  of  the  underlying  additive  groups 
of  the  Ms’s  and  defining  scalar  multiplication  by  members  of  R  to  be  scalar 
multiplication  in  each  coordinate.  The  associated  abelian-group  homomorphisms 
iso  :  Ms„  ®  ses  become  R  homomorphisms  under  this  definition  of  scalar 
multiplication  on  the  direct  sum.  Direct  sum  amounts  to  the  coproduct  functor  for 
the  category  of  left  R  modules;  we  omit  the  easy  verification,  which  makes  use 
of  the  corresponding  fact  about  abelian  groups.  As  in  the  case  of  abelian  groups, 
we  can  speak  of  an  external  direct  sum  as  the  result  of  a  construction  that  starts 
with  a  subset  of  the  product  of  the  sets  Ms ,  and  we  can  speak  of  recognizing  a 
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direct  sum  as  internal  when  the  Ms  ’s  are  contained  in  the  direct  sum  and  each  is 
is  the  inclusion  mapping. 

(9)  Free  R  modules.  Let  R  be  a  nonzero  ring  with  identity,  and  let  S  be  a 
nonempty  set.  As  in  Example  5,  let  us  regard  R  as  a  unital  left  R  module.  Then 
the  left  R  module  given  as  the  direct  sum  F(S)  =  0ssS  R  is  called  a  free  R 
module,  or  free  left  R  module.  We  define  i  :  S  — >■  Ft. S’)  by  i (s )  =  ;v( I ), 
where  is  is  the  usual  embedding  map  for  the  direct  sum  of  R  modules.  The  left 
R  module  F(S )  has  a  universal  mapping  property  similar  to  the  corresponding 
property  of  free  abelian  groups.  This  is  illustrated  in  Figure  8.2  and  is  described 
as  follows:  whenever  M  is  a  unital  left  R  module  and  :  S  — >■  M  is  a  function, 
then  there  exists  a  unique  R  homomorphism  <f>  :  F(S)  — »■  M  such  that  <t>  i  =  (p. 
The  existence  of  d>  as  an  R  homomorphism  follows  from  the  universal  mapping 
property  of  direct  sums  (Example  8)  as  soon  as  the  property  is  demonstrated  for 
S  equal  to  a  singleton  set.  Thus  let  A  be  any  left  R  module,  and  let  a  e  A  be 
given;  then  it  is  evident  that  r  ra  is  the  unique  R  homomorphism  of  the  left 
R  module  R  into  A  carrying  1  to  a. 

S  — p-+  M 

71 

‘  /  0 

F(S ) 

Figure  8.2.  Universal  mapping  property  of  a  free  left  R  module. 

If  R  is  a  ring  and  M  is  a  left  R  module,  then  an  R  submodule  /V  of  A7  is  an 
additive  subgroup  of  M  that  is  closed  under  scalar  multiplication,  i.e.,  has  rm  in 
N  when  r  is  in  R  and  m  is  in  N.  In  situations  in  which  there  is  no  ambiguity,  the 
use  of  “left”  in  connection  with  R  submodules  is  not  necessary. 

Examples  of  submodules.  If  V  is  a  vector  space  over  a  field  K,  then  a  IK 
submodule  of  V  is  a  vector  subspace  of  V .  If  M  is  an  abelian  group,  then  a  Z 
submodule  of  M  is  a  subgroup.  In  Example  6  of  modules,  in  which  S  =  Mm  ( R), 
then  an  example  of  a  left  S  submodule  of  Mmn  (  R)  is  all  matrices  with  0  in  every 
entry  of  a  specified  subset  of  the  n  columns. 

If  the  ring  R  has  an  identity  and  M  is  a  unital  left  R  module,  then  the  R 
submodule  of  M  generated  by  m  e  M,  i.e.,  the  smallest  R  submodule  containing 
m,  is  Rm,  the  set  of  products  rm  with  r  in  R.  In  fact,  the  set  of  all  rm  is  an  abelian 
group  since  (r  ±  s)m  =  rm  ±  sm,  it  is  closed  under  scalar  multiplication  since 
s(rm )  =  ( sr)m ,  and  it  contains  m  since  bn  =  m.  Flowever,  if  the  left  R  module 
M  is  not  unital,  then  the  R  submodule  generated  by  m  may  not  equal  Rm,  and  it 
was  for  that  reason  that  R  modules  were  assumed  to  be  unital  in  the  construction  of 
free  R  modules  in  Example  9  of  modules  above.  More  generally  the  R  submodule 
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of  M  generated  by  a  finite  set  [m  \ , . . . ,  mn }  in  M  is  Rm  !  +  •••  +  Rmn  if  the  left 
R  module  M  is  unital. 

Example  5  of  modules  treated  R  as  a  left  R  module.  In  this  setting  the  left 
R  submodules  are  called  left  ideals  in  R.  That  is,  a  left  ideal  7  is  an  additive 
subgroup  of  R  such  that  ri  is  in  7  whenever  r  is  in  R  and  i  is  in  7.  Asa  special 
case  of  what  was  said  in  the  previous  paragraph,  if  the  ring  R  has  an  identity,  then 
the  left  R  module  R  is  automatically  unital,  and  the  left  ideal  of  R  generated  by 
an  element  a  is  Ra,  the  set  of  all  products  ra  with  r  in  R. 

Similarly  a  right  ideal  in  R  is  an  additive  subgroup  7  such  that  ir  is  in  7 
whenever  r  is  in  R  and  i  is  in  7.  The  right  ideals  are  the  right  R  submodules 
of  the  right  R  module  R.  If  R  is  commutative,  then  left  ideals,  right  ideals,  and 
two-sided  ideals  are  all  the  same. 

Suppose  that  <p  :  M  — TV  is  an  R  homomorphism  of  left  R  modules.  In  this 
situation  we  readily  verify  that  the  kernel  of  <p,  denoted  by  ker  <p  as  usual,  is  an 
R  submodule  of  M ,  and  the  image  of  tp,  denoted  by  image  tp  as  usual,  is  an  R 
submodule  of  TV.  The  R  homomorphism  tp  is  one-one  if  and  only  if  ker</5  =  0,  as 
a  consequence  of  properties  of  homomorphisms  of  abelian  groups.  A  one-one  R 
homomorphism  of  one  left  R  module  onto  another  is  called  an  R  isomorphism; 
its  inverse  is  automatically  an  R  isomorphism,  and  “is  R  isomorphic  to”  is  an 
equivalence  relation. 

Still  with  R  as  a  ring,  suppose  that  M  is  a  left  R  module  and  TV  is  an  R 
submodule.  Then  we  can  form  the  quotient  M/TV  of  abelian  groups.  This  becomes 
a  left  R  module  under  the  definition  r  {m  +  TV)  =  rm  +  TV,  as  we  readily  check. 
We  call  M/TV  a  quotient  module.  The  quotient  mapping  m  m  +  N  of  M  to 
M/N  is  an  R  homomorphism  onto.  A  particular  example  of  a  quotient  module 
is  R/I ,  where  7  is  a  left  ideal  in  R. 

We  can  now  go  over  the  results  on  quotients  of  abelian  groups  in  Section  IV.2, 
specifically  Proposition  4.1 1  through  Theorem  4.14,  and  check  that  they  extend 
immediately  to  results  about  left  R  modules.  The  statements  appear  below.  The 
arguments  are  all  routine,  and  there  is  no  point  in  repeating  them.  In  the  special 
case  that  R  is  a  field  and  the  R  modules  are  vector  spaces,  these  results  specialize 
to  results  proved  in  Sections  II. 5  and  II. 6. 

Proposition  8.1.  Let  R  be  a  ring,  let  tp  :  M\  — >  Mi  be  an  R  homomorphism 
between  left  R  modules,  let  TVo  =  ker<p,  let  TV  be  an  R  submodule  of  M\ 
contained  in  TVo,  and  define  q  :  M\  — *  M\/N  to  be  the  R  module  quotient 
map.  Then  there  exists  an  R  homomorphism  tp  :  A/j  /TV  — >  Mi  such  that 
<p  =  tpq ,  i.e,  tp(m\  +  TV)  =  tp{m\).  It  has  the  same  image  as  <p,  and  kerip  = 
{h0N  |  h0  e  TVo}. 

Remark.  As  with  groups,  one  says  that  <p  factors  through  M\  /TV  or  descends 
to  Mi /TV.  Figure  8.3  illustrates  matters. 
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M\  ^  M2 

q  /  <P 

Mi/N 

FIGURE  8.3.  Factorization  of  R  homomorphisms  via  a  quotient  of  R  modules. 

Corollary  8.2.  Let  R  be  a  ring,  let  <p  :  M\  — >  M2  be  an  R  homomorphism 
between  left  R  modules,  and  suppose  that  <p  is  onto  M2  and  has  kernel  Ah  Then 
<p  exhibits  the  left  R  module  M\/N  as  canonically  R  isomorphic  to  M2. 

Theorem  8.3  (First  Isomorphism  Theorem).  Let  R  be  a  ring,  let  <p  :  M\  — »■  M2 
be  an  R  homomorphism  between  left  R  modules,  and  suppose  that  <p  is  onto  M2 
and  has  kernel  K.  Then  the  map  N\  i->  q)(N\ )  gives  a  one-one  correspondence 
between 

(a)  the  R  submodules  Ah  of  M\  containing  K  and 

(b)  the  R  submodules  of  M2. 

Under  this  correspondence  the  mapping  m  +  N\  (p{m)  +  <p(N\)  is  an  R 
isomorphism  of  M\/N\  onto  M2/<p{N\). 

Remark.  In  the  special  case  of  the  last  statement  that  <p  :  M\  — >■  M2  is 
an  R  module  quotient  map  q  :  M  — >■  M/K  and  N  is  an  R  submodule  of 
M  containing  K ,  the  last  statement  of  the  theorem  asserts  the  R  isomorphism 
M/N  =  C M/K)/(N/K ). 

Theorem  8.4  (Second  Isomorphism  Theorem).  Let  R  be  a  ring,  let  M  be  a 
left  R  module,  and  let  N\  and  N2  be  R  submodules  of  M.  Then  N\  fl  N2  is  an 
R  submodule  of  N\ ,  the  set  N\  +  N2  of  sums  is  an  R  submodule  of  M,  and  the 
map  ri  i  +  (N\  (T  N2)  i — >■  n  \  f-  N2  is  a  well-defined  canonical  R  isomorphism 


Ah /(Ah  n  Ah)  =  (Ni  +  N2)/N2. 

A  quotient  of  a  direct  sum  of  R  modules  by  the  direct  sum  of  R  submodules 
is  the  direct  sum  of  the  quotients,  according  to  the  following  proposition.  The 
result  generalizes  Lemma  4.58,  which  treats  the  special  case  of  abelian  groups 
(unital  Z  modules). 

Proposition  8.5.  Let  R  be  a  ring,  let  M  =  ®s€5  Ms  be  a  direct  sum  of  left  R 
modules,  and  for  each  .s  in  5,  let  Ns  be  a  left  R  submodule  of  Ms.  Then  the  natural 
map  of  ®SS5  Ms  to  the  direct  sum  of  quotients  descends  to  an  R  isomorphism 
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PROOF.  Let  tp  :  ®vg5  Ms  — >  ®sgS  (MJNS)  he  the  R  homomorphism  defined 
by  <p({ms}s€S)  =  [ms  +  Ns}s€S.  The  mapping  <p  is  onto  ®ss5  (. Ms/Ns ),  and  the 
kernel  is  ®sg5  Ns.  Then  Corollary  8.2  shows  that  (p  descends  to  the  required  R 
isomorphism.  □ 

Examples  of  rings,  continued. 

(15)  Associative  algebras  over  commutative  rings  with  identity.  These  directly 
generalize  Example  6  of  rings.  Let  R  be  a  nonzero  commutative  ring  with  identity. 
An  associative  algebra  over  R ,  or  associative  R  algebra,  is  a  ring  A  that  is  also 
a  left  R  module  such  that  multiplication  Ax  A  — A  is  R  linear  in  each  variable. 
The  conditions  of  R  linearity  in  each  variable  mean  that  addition  satisfies  the 
usual  distributive  laws  for  a  ring  and  that  the  following  condition  is  to  be  satisfied 
relating  multiplication  and  scalar  multiplication: 

(, ra)b  =  r(ab )  =  a(rb )  for  all  r  in  R  and  a,  b  e  A. 

If  A  has  an  identity,  the  displayed  condition  says  that  all  scalar  multiples  of 
the  identity  lie  in  the  center  of  A,  i.e.,  commute  with  every  element  of  A. 
Examples  2  and  3,  treating  polynomial  rings  and  matrix  rings  whose  scalars 
lie  in  a  commutative  ring  with  identity,  furnish  examples.  Every  ring  R  is  an 
associative  Z  algebra  when  the  Z  action  is  defined  so  as  to  make  the  abelian 
group  underlying  the  additive  structure  of  R  into  a  Z  module.  All  that  needs  to  be 
checked  is  the  displayed  formula.  For  n  =  1,  we  have  (1  a)b  =  I  (ah)  =  a{lb ) 
since  the  Z  module  R  is  unital.  If  we  also  have  ( na)b  =  n{ab)  =  a(nb )  for  a 
positive  integer  n,  then  we  can  add  and  use  the  appropriate  distributive  laws  to 
obtain  ((n  +  1  )a)b  =  (n  +  I  ){ab)  =  a((n  +  I  )b).  Induction  therefore  gives 
(na)b  =  n(ab)  =  a(nb)  for  all  positive  integers  n.  and  this  equality  extends 
to  all  integers  n  by  using  additive  inverses.  The  associative  R  algebras  form 
a  category  in  which  the  morphisms  from  one  such  algebra  to  another  are  the 
ring  homomorphisms  that  are  also  R  homomorphisms.  The  product  functor 
for  this  category  is  the  direct  product  as  in  Example  14  with  an  overlay  of  scalar 
multiplication  as  in  Example  7  of  modules.  The  coproduct  functor  in  the  category 
of  commutative  associative  R  algebras  with  identity  is  more  subtle  and  involves 
a  tensor  product  over  R ,  a  notion  we  postpone  introducing  until  Chapter  X. 

(16)  Group  algebra  RG  over  R.  If  G  is  a  group  and  R  is  a  commutative  ring 
with  identity,  then  we  can  introduce  a  multiplication  in  the  free  R  module  RG 
on  the  elements  of  G  by  the  definition  ( J2i  rigi)(J2j  sjhj)  =  J2ij  ( riSj)(gihj ) 
when  the  r,  and  sj  are  in  R  and  the  gi  and  hj  are  in  G.  It  is  immediate  that 
this  multiplication  makes  the  free  R  module  into  an  associative  R  algebra  with 
identity,  and  R  G  is  called  the  group  algebra  of  G  over  R .  The  special  case  R  =  7L 
leads  to  the  integral  group  ring  as  in  Example  12.  The  group  G  is  embedded  as  a 
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subgroup  of  the  group  (RG)X  of  units  of  R G ,  each  element  of  g  being  identified 
with  a  sum  i(g)  =  ^  r, g,  in  which  the  only  nonzero  term  is  lg.  The  associative 
R  algebra  RG  has  a  universal  mapping  property  similar  to  that  in  Figure  8.1  and 
given  in  Figure  8.4  as  follows:  whenever  <p  :  G  — >  A  is  a  group  homomorphism 
of  G  into  the  group  Ax  of  units  of  an  associative  R  algebra  A,  then  there  exists 
a  unique  associative  R  algebra  homomorphism  :  RG  — »■  A  such  that  < t>  l  =  tp. 

G  — ^  A 

n 

i  /  $ 

RG 


FIGURE  8.4.  Universal  mapping  property  of  the  group  algebra  RG. 

(17)  Scalar- valued  functions  of  finite  support  on  a  group,  with  convolution 
as  multiplication.  If  G  is  a  group  and  R  is  a  commutative  ring  with  identity, 
denote  by  C(G,  R)  the  R  module  of  all  functions  from  G  into  R  that  are  of  finite 
support  in  the  sense  that  each  function  is  0  except  on  a  finite  subset  of  G.  This  R 
module  readily  becomes  an  associative  R  algebra  if  ring  multiplication  is  taken 
to  be  pointwise  multiplication,  but  the  interest  here  is  in  a  different  definition  of 
multiplication.  Instead,  multiplication  is  defined  to  be  convolution  with 

(/i  *  h)(x)  =  Y  Mxy~1)f2(y)  =  Y  /i (y)f2(y~1x). 

ye  G  yeG 


The  sums  in  question  are  finite  because  of  the  finite  support  of  f\  and  /),  and  the 
sums  are  equal  by  a  change  of  variables.  This  multiplication  was  introduced  in 
the  special  case  R  =  C  in  Section  VII. 4,  and  the  argument  for  associativity  given 
there  in  the  special  case  works  in  general.  With  convolution  as  multiplication, 
C(G,  R)  becomes  an  associative  R  algebra  with  identity.  Problem  14  at  the  end 
of  the  chapter  asks  for  a  verification  that  the  mapping  g  fg  with 


fg(x)  = 


1 

0 


for  v  =  g, 
for  x  /  g, 


extends  to  an  R  algebra  isomorphism  of  R G  onto  C(G,  R). 


2.  Integral  Domains  and  Fields  of  Fractions 

For  the  remainder  of  the  chapter  we  work  with  commutative  rings  only.  In  several 
of  the  sections,  including  this  one,  the  commutative  ring  will  be  an  integral 
domain,  i.e.,  a  nonzero  commutative  ring  with  identity  and  with  no  zero  divisors. 
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In  this  section  we  show  how  an  integral  domain  can  be  embedded  canonically 
in  a  field.  This  embedding  is  handy  for  recognizing  certain  facts  about  integral 
domains  as  consequences  of  facts  about  fields.  For  example  Proposition  4.28b 
established  that  if  R  is  a  nonzero  integral  domain  and  if  A(X)  is  a  polynomial  in 
7?[A]  of  degree  n  >  0,  then  A(X)  has  at  most  n  roots.  Since  the  coefficients  of 
the  polynomial  can  be  considered  to  be  members  of  the  larger  field  that  contains 
R,  this  result  is  an  immediate  consequence  of  the  corresponding  fact  about  fields 
(Corollary  1.14). 

The  prototype  is  the  construction  of  the  field  Q  of  rationals  from  the  integral 
domain  Z  of  integers  as  in  Section  A3  of  the  appendix,  in  which  one  thinks  of  | 
as  a  pair  {a,  b)  with  b  /  0  and  then  identifies  pairs  by  saying  that  |  =  |  if  and 
only  if  ad  =  be. 

We  proceed  in  the  same  way  in  the  general  case.  Thus  let  R  be  an  integral 
domain,  form  the  set 

F  =  {( a,b )  |  a  £  R,  b  e  R,  b  /  0}, 

and  impose  the  equivalence  relation  ( a ,  b)  ~  (c,  d )  if  ad  =  be.  The  relation 
~  is  certainly  reflexive  and  symmetric.  To  see  that  it  is  transitive,  suppose  that 
(a,  b)  ~  (c,  d)  and  (c,  d)  ~  ( e ,  /).  Then  ad  =  be  and  cf  =  de,  and  these 
together  force  adf  =  bef  =  bde.  In  turn,  this  implies  af  =  be  since  R  is  an 
integral  domain  and  d  is  assumed  ^  0.  Thus  ~  is  transitive  and  is  an  equivalence 
relation.  Let  F  be  the  set  of  equivalence  classes. 

The  definition  of  addition  in  F  is  (a,  b)  +  (c,  d)  =  (ad+bc,  bd),  the  expression 
we  get  by  naively  clearing  fractions,  and  we  want  to  see  that  addition  is  consistent 
with  the  equivalence  relation.  In  checking  this,  we  need  change  only  one  of  the 
pairs  at  a  time.  Thus  suppose  that  {a' ,  b')  ~  (a.  b)  and  that  (c,  d)  is  given.  We 
know  that  a'b  =  ab' ,  and  we  want  to  see  that  {ad  +  be ,  bd)  ~  {a'd  +  b'c ,  b'd), 
i.e.,  that  (ad  +  bc)b'd  =  ( a'd  +  b'c)bd.  In  other  words,  we  are  to  check  that 
adb'd  =  a'  dbd\  we  see  immediately  that  this  equality  is  valid  since  ab'  =  a'b. 
Consequently  addition  is  consistent  with  the  equivalence  relation  and  descends 
to  be  defined  on  the  set  F  of  equivalence  classes. 

Taking  into  account  the  properties  satisfied  by  members  of  an  integral  domain, 
we  check  directly  that  addition  is  commutative  and  associative  on  F,  and  it  follows 
that  addition  is  commutative  and  associative  on  F. 

The  element  (0,  1)  is  a  two-sided  identity  for  addition  in  F,  and  hence  the 
class  of  (0,  1)  is  a  two-sided  identity  for  addition  in  F.  We  denote  this  class 
by  0.  Let  us  identify  this  class.  A  pair  {a,  b)  is  in  the  class  of  (0,  1)  if  and  only 
if  0  •  b  =  l  a,  hence  if  and  only  if  a  =  0.  In  other  words,  the  class  of  (0,  1) 
consists  of  all  (0,  b)  with  b  ^  0. 

In  F,  we  have  (a,  b)  +  (— a ,  b)  =  {ab  +  b{—a ),  bb)  =  (0,  b 2)  ~  (0,  1),  and 
therefore  the  class  of  {—a,  b)  is  a  two-sided  inverse  to  the  class  of  {a,  b)  under 
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addition.  Consequently  F  is  an  abelian  group  under  addition. 

The  definition  of  multiplication  in  F  is  ( a,b)(c,d )  =  ( ac,bd ),  and  it  is 
routine  to  see  that  this  definition  is  consistent  with  the  equivalence  relation. 
Therefore  multiplication  descends  to  be  defined  on  F.  We  check  by  inspection 
that  multiplication  is  commutative  and  associative  on  F,  and  it  follows  that  it  is 
commutative  and  associative  on  F.  The  element  (1 ,  1)  is  a  two-sided  identity  for 
multiplication  in  F ,  and  the  class  of  (1,  1)  is  therefore  a  two-sided  identity  for 
multiplication  in  F .  We  denote  this  class  by  1. 

If  ( a ,  b)  is  not  in  the  class  0,  then  a  /  0,  as  we  saw  above.  Then  ab  /  0, 
and  we  have  (a,  b)(b,  a)  =  (ab,  ab)  ~  (1,  1)  =  1.  Hence  the  class  of  ( b ,  a)  is 
a  two-sided  inverse  of  the  class  of  (a,  b)  under  multiplication.  Consequently  the 
nonzero  elements  of  F  form  an  abelian  group  under  multiplication. 

For  one  of  the  distributive  laws,  the  computation 

(a,  b)((c,  d )  +  (e,  /))  =  (a,  b)(cf  +  de,  df)  =  ( a(cf  +  de ),  bdf) 

=  ( acf  +  ade ,  bdf)  ~  ( acbf  +  bdae ,  b2df) 

=  ( ac ,  bd)  +  (ae,  bf  )  =  (a,  b)(c,  d)  +  (a,  b)(e,  f) 

shows  that  the  classes  of  (a,  b)((c,  d)  +  (e,  /))  and  of  (a,  b)(c,  d)  +  (a,  b)(e,  f) 
are  equal.  The  other  distributive  law  follows  from  this  one  since  F  is  commutative 
under  multiplication.  Therefore  F  is  a  field. 

The  held  F  is  called  the  field  of  fractions  of  the  integral  domain  R.  The 
function  r]  :  R  — »■  F  defined  by  saying  that  r](r)  is  the  class  of  (r,  1)  is  easily 
checked  to  be  a  homomorphism  of  rings  sending  1  to  1.  It  is  one-one.  Let  us 
call  it  the  canonical  embedding  of  R  into  F.  The  pair  ( F ,  //)  has  the  universal 
mapping  property  stated  in  Proposition  8.6  and  illustrated  in  Figure  8.5. 


F 

FIGURE  8.5.  Universal  mapping  property  of  the  field  of  fractions  of  R. 

Proposition  8.6.  Let  R  be  an  integral  domain,  let  F  be  its  field  of  fractions, 
and  let  r;  be  the  canonical  embedding  of  R  into  F.  Whenever  <p  is  a  one-one  ring 
homomorphism  of  R  into  a  field  F'  carrying  1  to  1,  then  there  exists  a  unique 
ring  homomorphism  <p  :  F  — >■  F'  such  that  <p  =  <pr),  and  <p  is  one-one  as  a 
homomorphism  of  fields. 

Remark.  We  say  that  <p  is  the  extension  of  <p  from  R  to  F.  Once  this 
proposition  has  been  proved,  it  is  customary  to  drop  i)  from  the  notation  and 
regard  R  as  a  subring  of  its  field  of  fractions. 
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PROOF.  If  {a,  b)  with  b  ^  0  is  a  pair  in  F,  we  define  d>(a,  b)  =  tp{a)(p{b)~x . 
This  is  well  defined  since  b  ^  0  and  since  <p,  being  one-one,  cannot  have  <p(h)  =  0. 
Let  us  see  that  <t>  is  consistent  with  the  equivalence  relation,  i.e.,  that  (a,  b)  ~ 
( a' ,  b ')  implies  <b(a,b)  =  <t>(a',b').  Since  ( a,b )  ~  (a\b'),  we  have  ab’  = 
a'b  and  therefore  also  (p(a)cp(b')  =  (p(a')tp(b )  and  <$>(a,b)  =  (p(a)(p(b)~ 1  = 
tp{a')(p(b')~x  =  ^{a' ,  b'),  as  required. 

We  can  thus  define  <p  of  the  class  of  (a,  b )  to  he  4>(a,  b),  and  tp  is  well  defined 
as  a  function  from  F  to  F'.  If  r  is  in  R ,  then  <p(r](r))  =  <p(class  of  (r,  1))  = 
<b(r,  1)  =  <p(r)^>(l)-1,  and  this  equals  tp(r)  since  (p  is  assumed  to  carry  1  into  1. 
Therefore  tpi]  =  <p. 

For  uniqueness,  let  the  class  of  ( a ,  b)  be  given  in  F.  Since  b  is  nonzero, 
this  class  is  the  same  as  the  class  of  (a,  1  )(b,  1)_1,  which  equals  rj{a)r]{b)~l . 
Since  ( tpi))(a )  =  tp{a)  and  (tprj)(b)  =  cp(b ),  we  must  have  <p(class  of  (a,  b ))  = 
tp{ri(a))(p{ri{b))~x  =  (p{a)(p{b)~l .  Therefore  tp  uniquely  determines  <p.  □ 

If  IK  is  a  held,  then  R  =  K[X]  is  an  integral  domain,  and  Proposition  8.6  applies 
to  this  R.  The  held  of  fractions  consists  in  effect  of  formal  rational  expressions 
P (X ) Q (X )~ ]  in  the  indeterminate  X ,  with  the  expected  identihcations  made. 
We  write  K(X)  for  this  held  of  fractions.  More  generally  the  held  of  fractions 
of  the  integral  domain  K[Xi, . . . .  X,,\  consists  of  formal  rational  expressions  in 
the  indeterminates  Xi, . . . ,  Xn,  with  the  expected  identihcations  made,  and  is 
denoted  by  K(X| . Xn). 


3.  Prime  and  Maximal  Ideals 

In  this  section,  R  will  denote  a  commutative  ring,  not  necessarily  having  an 
identity.  We  shall  introduce  the  notions  of  "prime  ideal”  and  "maximal  ideal,” 
and  we  shall  investigate  relationships  between  these  two  notions. 

A  proper  ideal  /  in  R  is  prime  if  ab  e  I  implies  a  e  /  or  b  e  /.  The  ideal 
/  =  R  is  not  prime,  by  convention.5  We  give  three  examples  of  prime  ideals;  a 
fourth  example  will  be  given  in  a  proposition  immediately  afterward. 

Examples. 

(1)  For  Z,  it  was  shown  in  an  example  just  before  Proposition  4.21  that  each 
ideal  is  of  the  form  mZ  for  some  integer  m .  We  may  assume  that  m  >  0.  The 
prime  ideals  are  0  and  all  p7L  with  p  prime.  To  see  this  latter  fact,  consider  m  Z 
with  m  >  2.  If  m  =  ab  nontrivially,  then  neither  a  nor  h  is  in  /,  but  ab  is  in  /; 
hence  I  is  not  prime.  Conversely  if  m  is  prime,  and  if  ab  is  in  I  =  mZ,  then 

5 This  convention  is  now  standard.  Books  written  before  about  1960  usually  regarded  I  =  R  as 
a  prime  ideal.  Correspondingly  they  usually  treated  the  zero  ring  as  an  integral  domain. 
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ab  =  me  for  some  integer  c.  Since  m  is  prime,  Lemma  1.6  shows  that  m  divides 
a  or  m  divides  b.  Hence  a  is  in  7  or  b  is  in  7.  Therefore  7  is  prime. 

(2)  If  K  is  a  field,  then  each  ideal  in  R  =  K[X]  is  of  the  form  A(X)K[X]  with 
A(X)  in  K[X],  and  A(X)K[X]  is  prime  if  and  only  if  A(X)  is  0  or  is  a  prime 
polynomial.  In  fact,  each  ideal  is  of  the  form  A(X)K[X]  by  Proposition  5.8.  If 
A(X)  is  not  a  constant  polynomial,  then  the  argument  that  A(X)K[X]  is  prime 
if  and  only  if  the  polynomial  A(X)  is  prime  proceeds  as  in  Example  1,  using 
Lemma  1 . 16  in  place  of  Lemma  1.6. 

(3)  In  R  =  Z[X],  the  structure  of  the  ideals  is  complicated,  and  we  shall  not 
attempt  to  list  all  ideals.  Let  us  observe  simply  that  the  ideal  I  =  XZ[X]  is  prime. 
In  fact,  if  A(X)B(X)  is  in  XZ[X],  then  A(X)B(X)  =  XC(X)  for  some  C(X)  in 
Z[X],  If  the  constant  terms  of  A(X)  and  B(X)  are  ap  and  bo,  this  equation  says 
that  aobo  =  0.  Therefore  «o  =  0  or  bo  =  0.  In  the  first  case,  A(X)  =  X  P(X) 
for  some  P{X),  and  then  A(X)  is  in  7;  in  the  second  case,  B(X)  =  X Q(X )  for 
some  Q(X),  and  then  B(X)  is  in  1.  We  conclude  that  7  is  prime. 

Proposition  8.7.  An  ideal  7  in  the  commutative  ring  R  is  prime  if  and  only  if 
R/ I  is  an  integral  domain. 

PROOF.  If  a  proper  ideal  7  fails  to  be  prime,  choose  ab  in  7  with  a  £  I  and 
b  £  I.  Then  a  +  1  and  b  +  1  are  nonzero  in  R/I  and  have  product  0  +  7.  So 
R/I  is  nonzero  and  has  a  zero  divisor;  by  definition,  R/I  fails  to  be  an  integral 
domain. 

Conversely  if  R/I  (is  nonzero  and)  has  a  zero  divisor,  choose  a  +  I  and  b  +  I 
nonzero  with  product  0  +  7.  Then  neither  a  nor  b  is  in  7  but  ab  is  in  7 .  Since  7 
is  certainly  proper,  7  is  not  prime.  □ 

A  proper  ideal  7  in  the  commutative  ring  R  is  said  to  be  maximal  if  R  has  no 
proper  ideal  J  with  7  C  j  if  the  commutative  ring  R  has  an  identity,  a  simple 
way  of  testing  whether  an  ideal  7  is  proper  is  to  check  whether  1  is  in  7 ;  in  fact, 
if  1  is  in  7,  then  I  RI  R\  =  R  implies  7  =  7?.  Maximal  ideals  exist 
in  abundance  when  R  is  nonzero  and  has  an  identity,  as  a  consequence  of  the 
following  result. 

Proposition  8.8.  In  a  commutative  ring  R  with  identity,  any  proper  ideal  is 
contained  in  a  maximal  ideal. 

PROOF.  This  follows  from  Zorn’s  Lemma  (Section  A5  of  the  appendix). 
Specifically  let  7  be  the  given  proper  ideal,  and  form  the  set  S  of  all  proper 
ideals  that  contain  7.  This  set  is  nonempty,  containing  7  as  a  member,  and  we 
order  it  by  inclusion  upward.  If  we  have  a  chain  in  S,  then  the  union  of  the 
members  of  the  chain  is  an  ideal  that  contains  all  the  ideals  in  the  chain,  and  it  is 
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proper  since  it  does  not  contain  1 .  Therefore  the  union  of  the  ideals  in  the  chain  is 
an  upper  bound  for  the  chain.  By  Zorn’s  Lemma  the  set  S  has  a  maximal  element, 
and  any  such  maximal  element  is  a  maximal  ideal  containing  7.  □ 

Lemma  8.9.  If  R  is  a  nonzero  commutative  ring  with  identity,  then  R  is  a  held 
if  and  only  if  the  only  proper  ideal  in  R  is  0. 

PROOF.  If  R  is  a  held  and  7  is  a  nonzero  ideal  in  R.  let  a  7^  0  be  in  /.  Then 
1  =  aa~l  is  in  7,  and  consequently  I  =  R.  Conversely  if  the  only  ideals  in  R 
are  0  and  7?,  let  a  /  0  be  given  in  R.  and  form  the  ideal  I  =  aR.  Since  1  is  in 
R,  a  is  in  7.  Thus  7^0.  Then  7  must  be  R.  So  there  exists  some  b  in  R  with 
1  =  ba,  and  a  is  exhibited  as  having  the  inverse  b.  □ 

Proposition  8.10.  If  R  is  a  commutative  ring  with  identity,  then  an  ideal  7  is 
maximal  if  and  only  if  R/ 1  is  a  held. 

Remark.  One  can  readily  give  a  direct  proof,  but  it  seems  instructive  to  give 
a  proof  reducing  the  result  to  Lemma  8.9. 

PROOF.  We  consider  R  and  R/I  as  unital  R  modules,  the  ideals  for  each  of  R 
and  R/I  being  the  R  submodules.  The  quotient  ring  homomorphism  R  — »■  R/I  is 
an  R  homomorphism.  By  the  First  Isomorphism  Theorem  for  modules  (Theorem 
8.3),  there  is  a  one-one  correspondence  between  the  ideals  in  R  containing  7  and 
the  ideals  in  R/I .  Then  the  result  follows  immediately  from  Lemma  8.9.  □ 

Corollary  8.11.  If  R  is  a  commutative  ring  with  identity,  then  every  maximal 
ideal  is  prime. 

PROOF.  If  7  is  maximal,  then  R/I  is  a  held  by  Proposition  8.10.  Hence  R/I 
is  an  integral  domain,  and  7  must  be  prime  by  Proposition  8.7.  □ 

In  the  converse  direction  nonzero  prime  ideals  need  not  be  maximal,  as  the 
following  example  shows.  However,  Proposition  8.12  will  show  that  nonzero 
prime  ideals  are  necessarily  maximal  in  certain  important  rings. 

Example.  In  R  =  Z[X\,  we  have  seen  that  7  =  XZ[X]  is  a  prime  ideal.  But 
7  is  not  maximal  since  XZ[X]  +  2Z[X]  is  a  proper  ideal  that  strictly  contains  7. 

Proposition  8.12.  In  R  =  Z  or  R  =  K[X]  with  K  a  held,  every  nonzero  prime 
ideal  is  maximal. 

PROOF.  Examples  1  and  2  at  the  beginning  of  this  section  show  that  every 
nonzero  prime  ideal  is  of  the  form  7  =  pR  with  p  prime.  If  such  an  7  is  given 
and  if  J  is  any  ideal  strictly  containing  7,  choose  a  in  J  with  a  not  in  7.  Since  a 
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is  not  in  I  =  pR,  it  is  not  true  that  p  divides  a.  So  p  and  a  are  relatively  prime, 
and  there  exist  elements  x  and  y  in  R  with  xp  +  ya  =  1,  by  Proposition  1.2c  or 
1.15d.  Since  p  and  a  are  in  J,  so  is  1.  Therefore  J  =  R.  and  /  is  not  strictly 
contained  in  any  proper  ideal.  So  /  is  maximal.  □ 

Example.  Algebraic  number  fields  Q[0].  These  were  introduced  briefly  in 

Chapter  IV  and  again  in  Section  1  as  the  Q  linear  span  of  all  powers  1,  9,  92, _ 

Here  9  is  a  nonzero  complex  number,  and  we  make  the  assumption  that  Q[0]  is  a 
finite-dimensional  vector  space  over  Q.  Proposition  4. 1  showed  that  Q[0]  is  then 
indeed  a  field.  Let  us  see  how  this  conclusion  relates  to  the  results  of  the  present 
section.  In  fact,  write  a  nontrivial  linear  dependence  of  1,  9,  92, . . .  over  Q  in 
the  form  co  +  c\9  +  c292  +  •  •  •  +  cn-\0n~l  +  9n  =  0.  Without  loss  of  generality, 
suppose  that  this  particular  linear  dependence  has  n  as  small  as  possible  among 
all  such  relations.  Then  9  is  a  root  of 

P(X)  =  co  +  c\X  +  c2X~  +  •  •  •  +  cn-\Xn  '  +  X" . 

Consider  the  substitution  homomorphism  E  :  Q[ X  \  — >  C  given  by  E(A(X))  = 
A{9).  This  ring  homomorphism  carries  Q[  X  ]  onto  the  ring  Q[0],  and  the  kernel 
is  some  ideal  /.  Specifically  /  consists  of  all  polynomials  A(X)  with  A(0)  =  0, 
and  P(X)  is  one  of  these  of  lowest  possible  degree.  Proposition  5.8  shows  that  I 
consists  of  all  multiples  of  some  polynomial,  and  that  polynomial  may  be  taken 
to  be  P{X)  by  minimality  of  the  integer  n.  Proposition  8.1  therefore  shows 
that  Q[0]  =  Q[X]/P(X)Q[X]  as  a  ring.  If  P(X)  were  to  have  a  nontrivial 
factorization  as  P(X)  =  Qi(X)Q2(X),  then  P{9)  =  0  would  imply  Q\{9)  =  0 
or  Q2(9)  =  0,  and  we  would  obtain  a  contradiction  to  the  minimality  of  n. 
Therefore  P(X)  is  prime.  By  Example  2  earlier  in  the  section,  I  =  P(X)(Q)\X  \ 
is  a  nonzero  prime  ideal,  and  Proposition  8.12  shows  that  it  is  maximal.  By 
Proposition  8.10  the  quotient  ring  Q[0]  =  Q[X]/P(X)Q[X]  is  a  field.  These 
computations  with  Q[0]  underlie  the  first  part  of  the  theory  of  fields  that  we  shall 
develop  in  Chapter  IX. 


4.  Unique  Factorization 

We  have  seen  that  the  positive  members  of  Z  and  the  nonzero  members  of  K[X], 
when  K  is  a  field,  factor  into  the  products  of  “primes"  and  that  these  factorizations 
are  unique  up  to  order  and  up  to  adjusting  each  of  the  prime  factors  in  K[X]  by 
a  unit.  In  this  section  we  shall  investigate  this  idea  of  unique  factorization  more 
generally.  Zero  divisors  are  problematic  from  the  point  of  view  of  factorization, 
and  it  will  be  convenient  to  exclude  them.  Therefore  we  work  exclusively  with 
integral  domains. 
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The  first  observation  is  that  unique  factorization  is  not  a  completely  general 
notion  for  integral  domains.  Let  us  consider  an  example  in  detail. 

Example.  R  =  Z[>/— 5].  This  is  the  subring  of  C  whose  members  are  of 
the  form  a  +  5  with  a  and  b  integers.  Since  (a  +  bs/^5  )(c  +  d  s/^-5  )  = 

(ac  —  5 cd)  +  (ad  +  bc)s/— 5,  R  is  closed  under  multiplication  and  is  indeed  a 

subring.  Define  N(a  +  bsf—5 )  =  a2  +  5 b2  =  (a  +  bsf—5  )(a  +  &V— 5).  This 
is  a  nonnegative-integer- valued  function  on  R  and  is  0  only  on  the  0  element  of 
R.  Since  complex  conjugation  is  an  automorphism  of  C,  we  check  immediately 
that 


N((a  +  bV^5)(c  +  dV=5))  =  N(a  +  bV^5  )N(c  +  dV^5 ). 

The  group  of  units  of  R.  i.e.,  of  elements  with  inverses  under  multiplication,  is 
denoted  by  Rx  as  usual.  If  r  is  in  Rx,  then  rr~l  =  1,  and  so  N(r)N(r~l)  = 
N(  1)  =  1.  Consequently  the  units  r  of  R  all  have  N(r)  =  1.  Setting  a2+5b2  =  1, 
we  see  that  the  units  are  ±  1 .  The  product  formula  for  N  shows  that  if  we  start 
factoring  a  member  of  R ,  then  factor  its  factors,  and  so  on,  and  if  we  forbid 
factorizations  into  two  factors  when  one  is  a  unit,  then  the  process  of  factorization 
has  to  stop  at  some  point.  So  complete  factorization  makes  sense.  Now  consider 
the  equality 

6  =  (1  +  )(1  -  )  =  2  •  3. 

The  factors  here  have  N  ( I  +  sj— 5 )  =  AT I  —  s/— 5 )  =  6,  N  ( 2)  =  4,  and 
/V(3)  =  9.  Considering  the  possible  values  of  a2  +  5b2,  we  see  that  N(  ■ )  does 
not  take  on  either  of  the  values  2  and  3  on  R.  Consequently  1  +  V —5,  1  —  V —5, 
2,  and  3  do  not  have  nontrivial  factorizations.  On  the  other  hand,  consideration 
of  the  values  of  AT  • )  shows  that  2  and  3  are  not  products  of  either  of  1  ±  \J—5 
with  units.  We  conclude  that  the  displayed  factorizations  of  6  show  that  unique 
factorization  has  failed. 

Thus  unique  factorization  is  not  universal  for  integral  domains.  It  is  time 
to  be  careful  about  terminology.  With  Z  and  K[X],  we  have  referred  to  the 
individual  factors  in  a  complete  factorization  as  “primes.”  Their  defining  property 
in  Chapter  I  was  that  they  could  not  be  factored  further  in  nontrivial  fashion. 
Primes  in  these  rings  were  shown  to  have  the  additional  property  that  if  a  prime 
divides  a  product  then  it  divides  one  of  the  factors.  It  is  customary  to  separate 
these  two  properties  for  general  integral  domains.  Let  us  say  that  a  nonzero 
element  a  divides  b  if  b  =  ac  for  some  c.  In  this  case  we  say  also  that  a  is 
a  factor  of  b.  In  an  integral  domain  R,  a  nonzero  element  r  that  is  not  a  unit 
is  said  to  be  irreducible  if  every  factorization  r  =  r\i2  in  R  has  the  property 
that  either  r\  or  rj  is  a  unit.  Nonzero  nonunits  that  are  not  irreducible  are  said 
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to  be  reducible.  A  nonzero  element  p  that  is  not  a  unit  is  said  to  be  prime6  if 
the  condition  that  p  divides  a  product  ab  always  implies  that  p  divides  a  or  p 
divides  b. 

Prime  implies  irreducible.  In  fact,  if  p  is  a  prime  that  is  reducible,  let  us  write 
p  =  r\r 2  with  neither  r\  nor  r2  equal  to  a  unit.  Since  p  is  prime,  p  divides  r\  or 
/"2,  say  /-| .  Then  r\  =  pc  with  c  in  R,  and  we  obtain  p  =  r \  ip  =  pap.  Since 
R  is  an  integral  domain,  1  =  cr2,  and  r2  is  exhibited  as  a  unit  with  inverse  c,  in 
contradiction  to  the  assumption  that  r2  is  not  a  unit. 

On  the  other  hand,  irreducible  does  not  imply  prime.  In  fact,  we  saw  in 
Z[v^5  ]  that  1  +  5  is  irreducible.  But  1  +  y/— 5  divides  2-3,  and  1  +  ^^5 

does  not  divide  either  of  2  or  3.  Therefore  1  +  y/—5  is  not  prime. 

We  shall  see  in  a  moment  that  the  distinction  between  “irreducible”  and  “prime” 
lies  at  the  heart  of  the  question  of  unique  factorization.  Let  us  make  a  definition 
that  helps  identify  our  problem  precisely.  We  say  that  an  integral  domain  R  is  a 
unique  factorization  domain  if  R  has  the  two  properties 

(UFD1)  every  nonzero  nonunit  of  R  is  a  finite  product  of  irreducible  ele¬ 
ments, 

(UFD2)  the  factorization  in  (UFDl)  is  always  unique  up  to  order  and  to 
multiplication  of  the  factors  by  units. 

The  problem  that  arises  for  us  for  a  given  R  is  to  decide  whether  R  is  a  unique 
factorization  domain.  The  following  proposition  shows  the  relevance  of  the 
distinction  between  “irreducible”  and  “prime.” 

Proposition  8.13.  In  an  integral  domain  R  in  which  (UFDl)  holds,  the 
condition  (UFD2)  is  equivalent  to  the  condition 
(UFD2')  every  irreducible  element  is  prime. 

Remarks.  In  fact,  showing  that  irreducible  implies  prime  was  the  main  step 
in  Chapter  I  in  proving  unique  factorization  for  positive  integers  and  for  K[  A] 
when  IK  is  a  held.  The  mechanism  for  carrying  out  the  proof  that  irreducible 
implies  prime  for  those  settings  will  be  abstracted  in  Theorems  8.15  and  8.17. 

PROOF.  Suppose  that  (UFD2)  holds,  that  p  is  an  irreducible  element,  and 
that  p  divides  ab.  We  are  to  show  that  p  divides  a  or  p  divides  b.  We  may 
assume  that  ab  ^  0.  Write  ab  =  pc,  and  let  a  =  J~[(.  p, ,  b  =  ]"[  .  /;' ,  and 
c  =  [~[A  qk  be  factorizations  via  (UFDl)  into  products  of  irreducible  elements. 

f’This  definition  enlarges  the  definition  of  "prime”  in  Z  to  include  the  negatives  of  the  usual  prime 
numbers.  Unique  factorization  immediately  extends  to  nonzero  integers  of  either  sign,  but  the  prime 
factors  are  now  determined  only  up  to  factors  of  ±  1 .  In  cases  where  confusion  about  the  sign  of  an 
integer  prime  might  arise,  the  text  will  henceforth  refer  to  "primes  of  Z”  or  “integer  primes”  when 
both  signs  are  allowed,  and  to  "positive  primes”  or  "prime  numbers”  when  the  primes  are  understood 
to  be  as  in  Chapter  I. 
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Then  n,  j  PiPj  =  P  Uk  qi-.  By  (UFD2)  one  of  the  factors  on  the  left  side  is  sp 
for  some  unit  s.  Then  p  either  is  of  the  form  e~ 1  p,  and  then  p  divides  a.  or  is  of 
the  form  e-1  p’-  and  then  p  divides  b.  Hence  ( UFD2')  holds. 

Conversely  suppose  that  (UFD2')  holds.  Let  the  nonzero  nonunit  r  have  two 
factorizations  into  irreducible  elements  as  r  =  p\ pi  ■  ■  ■  pm  =  £o<7i<?2  •  •  •  q„  with 
m  <  n  and  with  £o  a  unit.  We  prove  the  uniqueness  by  induction  on  m,  the  case 
m  =  0  following  vacuously  since  r  is  not  a  unit  and  the  case  m  =  1  following 
from  the  definition  of  “irreducible.”  Inductively  from  (UFD2')  we  know  that  pm 
divides  q *  for  some  k.  Since  qt  is  irreducible,  qt  =  spm  for  some  unit  s.  Thus  we 
can  cancel  q^  and  obtain  p  \  pi  ■  ■  ■  p,n- 1  =  SQsq  i  <72  •••$*•••  q„  -  the  hat  indicating 
an  omitted  factor.  By  induction  the  factors  on  the  two  sides  here  are  the  same 
except  for  order  and  units.  Thus  the  same  conclusion  is  valid  when  comparing  the 
two  sides  of  the  equality  P1P2  •  •  •  pm  =  £o<?i</2  •  •  •  qn  ■  The  induction  is  complete, 
and  (UFD2)  follows.  □ 

It  will  be  convenient  to  simplify  our  notation  for  ideals.  In  any  commutative 
ring  R  with  identity,  if  a  is  in  R,  we  let  (a)  denote  the  ideal  Ra  generated  by  a. 
An  ideal  of  this  kind  with  a  single  generator  is  called  a  principal  ideal.  More 
generally,  if  a\, . . . ,  an  are  members  of  R ,  then  (ci\ , . . . ,  an)  denotes  the  ideal 
Ra\  +  •  •  •  +  Ran  generated  by  a  \ .  For  example,  in  Z[X],  (2,  X)  denotes 

the  ideal  2Z  +  X7L  of  all  polynomials  whose  constant  term  is  even.  The  following 
condition  explains  a  bit  the  mystery  of  what  it  means  for  an  element  to  be  prime. 

Proposition  8.14.  A  nonzero  element  p  in  an  integral  domain  R  is  prime  if 
and  only  if  the  ideal  ( p)  in  R  is  prime. 

PROOF.  Suppose  that  the  element  p  is  prime.  Then  the  ideal  ( p)  is  not  R\  in 
fact,  otherwise  1  would  have  to  be  of  the  form  1  =  rp  for  some  r  e  R,r  would  be 
a  multiplicative  inverse  of  p,  and  p  would  be  a  unit.  Now  suppose  that  a  product 
ab  is  in  the  ideal  (p).  Then  ab  =  pr  for  some  r  in  R.  and  p  divides  ab.  Since  p 
is  prime,  p  divides  a  or  p  divides  b.  Therefore  the  ideal  ( p)  is  prime. 

Conversely  suppose  that  (p)  is  a  prime  ideal  with  p  ^  0.  Since  (p)  ^  R.  p 
is  not  a  unit.  If  p  divides  the  product  ab ,  then  ab  =  pc  for  some  c  in  R.  Hence 
ab  is  in  (p).  Since  (p)  is  assumed  prime,  either  a  is  in  ( p)  or  b  is  in  {p).  In  the 
first  case,  p  divides  a,  and  in  the  second  case,  p  divides  b.  Thus  the  element  p  is 
prime.  □ 

An  integral  domain  R  is  called  a  principal  ideal  domain  if  every  ideal  in  R  is 
principal.  At  the  beginning  of  Section  3,  we  saw  a  reminder  that  Z  is  a  principal 
ideal  domain  and  that  so  is  K[X]  whenever  IK  is  a  field.  It  turns  out  that  unique 
factorization  for  these  cases  is  a  consequence  of  this  fact. 
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Theorem  8.15.  Every  principal  ideal  domain  is  a  unique  factorization  domain. 

Remarks.  Let  R  be  the  given  principal  ideal  domain.  Proposition  8.13  shows 
that  it  is  enough  to  show  that  (UFD1)  and  (UFD2')  hold  in  R. 

PROOF  OF  (UFD1).  Let  ai  be  a  nonzero  nonunit  of  R.  Then  the  ideal  (a i) 
in  R  is  proper  and  nonzero,  and  Proposition  8.8  shows  that  it  is  contained  in  a 
maximal  ideal.  Since  R  is  a  principal  ideal  domain,  this  maximal  ideal  is  of  the 
form  (ci)  for  some  c i,  and  ci  is  a  nonzero  nonunit.  Maximal  ideals  are  prime 
by  Corollary  8.1 1,  and  Proposition  8.14  thus  shows  that  c \  is  a  prime  element, 
necessarily  irreducible.  Therefore  the  inclusion  ia\)  c  ( t; , )  shows  that  some 
irreducible  element,  namely  a,  divides  ci\. 

Write  ci\  =  c\a2,  and  repeat  the  above  argument  with  a2.  Iterating  this 
construction,  we  obtain  a„  =  cnan+\  for  each  n  with  c„  irreducible.  Thus 
ci ]  =  0C2  •  •  ■cnan+ 1  with  c\, ...  ,cn  irreducible.  Let  us  see  that  this  process 
cannot  continue  indefinitely.  Assuming  the  contrary,  we  are  led  to  the  strict 
inclusions 

(aO  g  ( a2 )  g  (a3)  ^  •  ••  • 

Put  I  =  UhIiC0")-  Then  /  is  an  ideal.  Since  R  is  a  principal  ideal  domain, 
I  =  (a)  for  some  element  a.  This  element  a  must  be  in  (a/,.)  for  some  k,  and  then 
we  have  (a*)  =  (fl^+i)  =  •  •  •  =  (a).  Since  (a^)  =  («*+i),  c*  has  to  be  a  unit, 
contradiction.  Thus  a a  has  no  nontrivial  factorization,  and  a.\  =  c\  -  ■  ■  Q-iOa  is 
the  desired  factorization.  This  proves  (UFD1).  □ 

PROOF  OF  (UFD2').  If  p  is  an  irreducible  element,  we  prove  that  the  ideal  ip) 
is  maximal.  Corollary  8.11  shows  that  (p)  is  prime,  and  Proposition  8.14  shows 
that  p  is  prime.  Thus  (UFD2'j  will  follow. 

The  element  p.  being  irreducible,  is  not  a  unit.  Thus  ( p)  is  proper.  Suppose 
that  /  is  an  ideal  with  I  ^  ( p).  Since  R  is  a  principal  ideal  domain,  /  =  (c) 
for  some  c.  Then  p  =  rc  for  some  r  in  R.  Since  I  ^  (p ),  r  cannot  be  a  unit. 
Therefore  the  irreducibility  of  p  implies  that  c  is  a  unit.  Then  /  =  (c)  =  (1)  =  R. 
and  we  conclude  that  ( p)  is  maximal.  □ 

Let  us  record  what  is  essentially  a  corollary  of  the  proof. 

Corollary  8.16.  In  a  principal  ideal  domain,  every  nonzero  prime  ideal  is 
maximal. 

PROOF.  Let  (p)  be  a  nonzero  prime  ideal.  Proposition  8.14  shows  that  p 
is  prime,  and  prime  elements  are  automatically  irreducible.  The  argument  for 
(UFD2/)  in  the  proof  of  Theorem  8.15  then  deduces  in  the  context  of  a  principal 
ideal  domain  that  ( p)  is  maximal.  □ 
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Principal  ideal  domains  arise  comparatively  infrequently,  and  recognizing 
them  is  not  necessarily  easy.  The  technique  that  was  used  with  Z  and  K[X] 
generalizes  slightly,  and  we  take  up  that  generalization  now.  An  integral  domain 
R  is  called  a  Euclidean  domain  if  there  exists  a  function  S  :  R  — »■  {integers  >0} 
such  that  whenever  a  and  b  are  in  R  with  /?  /  0,  there  exist  q  and  r  in  R  with 
a  =  bq  +r  and  8(r)  <  S  ( b ).  The  ring  Z  of  integers  is  a  Euclidean  domain  if  we 
take  <5 (n)  =  \n\,  and  the  ring  K[  X  ]  for  K  a  field  is  a  Euclidean  domain  if  we  take 
8(P(X))  to  be  2degP  if  P(X)  ^  0  and  to  be  0  if  P(X)  =  0. 

Another  example  of  a  Euclidean  domain  is  the  ring  rL[\f— T  ]  =  Z  +  Z  v  —  I  of 
Gaussian  integers.  It  has  <5  (a +  6^+1)  =  (a  +  b -/+T  ){a  —  b \+-Y )  =a2  +  b2, 
a  and  b  being  integers.  Let  us  abbreviate  sj—  I  as  i.  To  see  that  S  has  the  required 
property,  we  first  extendSto  Q[t],  writing  <5(x  +  y/)  =  (x+yi)(x  —  yi)  =  x2  +  y2 
if  x  and  v  are  rational.  We  use  the  fact  that 

S(zz')  =  8(z)8(z')  for  z  and  z  in  Q[i], 


which  follows  from  the  computation  S(zz')  =  zz!  ■  zz!  =  zzz'z'  =  S(z)S(z'). 
For  any  real  number  u,  let  \u  \  be  the  greatest  integer  <  u.  Every  real  u  satisfies 
|  [u  +  —  u\  <  j.  Given  a  +  ib  and  c  +  di  with  c  +  di  ^  0,  we  write 


a  +  bi 
c  +  di 


( a+bi)(c  —  di )  ac  +  bd  be  —  ad 

c 2  +  d2  c2  +  d2  c2  T  d2 


Put  p  = 
Then 


ac+bd 

c2+d2 


q  = 


be— ad 
c2+d 2 


+  j  , andr+si  =  (a+bi)  —  (c+di)(p+qi). 


a  +  bi  =  (c  +  di)(p  +  qi)  +  (r  +  si), 


and 


8(r  +  si)  =  8((a  +  bi)  —  (c  +  di)(p  +  qi))  =  8(c  +  di)8(—^- - (p  +  qi)\ 

\c  +  di  / 

The  complex  number  * + y‘  =  SS  -  ip +«')  =  &  -  p)  +  -  «)■' 

has  |x|  <  j  and  |y  [  <  and  therefore  8 (x  +  yi)  =  x2  +  y2  <  1  +  5  =  5-  Hence 
5(r  +  si)  <  5(c  +  as  required. 

Some  further  examples  of  this  kind  appear  in  Problems  13  and  25-26  at  the 
end  of  the  chapter.  The  matter  is  a  little  delicate.  The  ring  1+f— 5  ]  may  seem 
superficially  similar  to  Z[y+T  ].  ButZ[>/+5  ]  does  not  have  unique  factorization, 
and  the  following  theorem,  in  combination  with  Theorem  8.15,  assures  us  that 
ZfV—S  ]  cannot  be  a  Euclidean  domain. 
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Theorem  8.17.  Every  Euclidean  domain  is  a  principal  ideal  domain. 

PROOF.  Let  7  be  an  ideal  in  R.  We  are  to  show  that  7  is  principal.  Without 
loss  of  generality,  we  may  assume  that  7^0.  Choose  b  d  0  in  /  with  8(b)  as 
small  as  possible.  Certainly  7  3  ( b ).  If  a  d  0  is  in  7,  write  a  =  bq  +  r  with 
8(r )  <  8(b).  Then  r  =  a  —  bq  is  in  7  with  8(r)  <  8(b).  The  minimality  of  b 
forces  r  =  0  and  a  =  bq.  Thus  7  C  (b),  and  we  conclude  that  7  =  (b).  □ 


5.  Gauss’s  Lemma 

In  the  previous  section  we  saw  that  every  principal  ideal  domain  has  unique 
factorization.  In  the  present  section  we  shall  establish  that  certain  additional 
integral  domains  have  unique  factorization,  namely  any  integral  domain  R\X\ 
for  which  R  is  a  unique  factorization  domain.  A  prototype  is  Z[X\,  which  will 
be  seen  to  have  unique  factorization  even  though  there  exist  nonprincipal  ideals 
like  (2,  X)  in  the  ring.  An  important  example  for  applications,  particularly  in 
algebraic  geometry,  is  K[Xi, . . . ,  Xn],  where  K  is  a  field;  in  this  case  our  result 
is  to  be  applied  inductively,  making  use  of  the  isomorphism  K[Xj , . . . ,  Xn]  = 
K[Xi, . . . ,  X„_i][X„]  given  in  Corollary  4.31. 

For  the  conclusion  that  7?[X]  has  unique  factorization  if  R  does,  the  heart  of 
the  proof  is  an  application  of  a  result  known  as  Gauss’s  Lemma,  which  we  shall 
prove  in  this  section.  Gauss’s  Lemma  has  additional  consequences  for  7?[X] 
beyond  unique  factorization,  and  we  give  them  as  well. 

Before  coming  to  Gauss’s  Lemma,  let  us  introduce  some  terminology  and 
prove  one  preliminary  result.  In  any  integral  domain  R ,  we  call  two  nonzero 
elements  a  and  b  associates  if  a  =  be  for  some  s  in  the  group  7?x  of  units.  The 
property  of  being  associates  is  an  equivalence  relation  because  7?x  is  a  group. 

Still  with  the  nonzero  integral  domain  R,  let  us  define  a  greatest  common 
divisor  of  two  nonzero  elements  a  and  b  to  be  any  element  c  of  R  such  that  c 
divides  both  a  and  b  and  such  that  any  divisor  of  a  and  b  divides  c.  Any  associate 
of  a  greatest  common  divisor  of  a  and  b  is  another  greatest  common  divisor  of 
a  and  b.  Conversely  if  a  and  b  have  a  greatest  common  divisor,  then  any  two 
greatest  common  divisors  are  associates.  In  fact,  if  c  and  d  are  greatest  common 
divisors,  then  each  of  them  divides  both  a  and  b.  and  the  definition  forces  each 
of  them  to  divide  the  other.  Thus  d  =  cs  and  c  =  dd ,  and  then  d  =  de'e  and 
1  =  d s.  Consequently  s  is  a  unit,  and  c  and  c  are  associates. 

If  R  is  a  unique  factorization  domain,  then  any  two  nonzero  elements  a  and  b 
have  a  greatest  common  divisor.  In  fact,  we  decompose  a  and  b  into  the  product 
of  a  unit  by  powers  of  nonassociate  irreducible  elements  as  a  =  e  Y[?=  i  pf  and 
b  =  d  UU  P'j’ '  ^or  eac^  Pj  suc'1  that  p'j  is  associate  to  some  /?, ,  we  replace 
P'j  by  Pi  in  the  factorization  of  b,  adjusting  d  as  necessary,  and  then  we  reorder 
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the  factors  of  a  and  b  so  that  the  common  pt  ’s  are  the  ones  for  1  <  i  <  r .  Then 
c  =  nU  ' 1  is  a  greatest  common  divisor  of  a  and  h.  We  write  GCD(a ,  b) 

for  a  greatest  common  divisor  of  a  and  b\  as  we  saw  above,  this  is  well  defined 
up  to  a  factor  of  a  unit.7 

One  should  not  read  too  much  into  the  notation.  In  a  principal  ideal  domain  if 
a  and  b  are  nonzero,  then,  as  we  shall  see  momentarily,  GCD(a,  b)  is  defined  by 
the  condition  on  ideals  that 


(GCD(a,6))  =  (a,  b). 

This  condition  implies  that  there  exist  elements  x  and  y  in  R  such  that 

xa  +  yb  =  GCD(a,  b ). 

However,  in  the  integral  domain  Z[X\,  in  which  GCD(2,  X)  =  1 ,  there  do  not 
exist  polynomials  A(X )  and  B(X)  with  A(X) 2  +  B(X)X  =  1. 

To  prove  that  (GCDln,  b))  =  (a,  b )  in  a  principal  ideal  domain,  write  (c) 
for  the  principal  ideal  (a,b)\  c  satisfies  c  =  xa  +  yb  for  some  x  and  y  in  R. 
Since  a  and  b  lie  in  (c),  a  =  rc  and  b  =  r'c.  Hence  c  divides  both  a  and  b. 
In  the  reverse  direction  if  d  divides  a  and  b,  then  ds  =  a  and  ds'  =  b.  Hence 
c  =  xa  +  yb  =  ( xs  +  ys')d,  and  d  divides  c.  So  c  is  indeed  a  greatest  common 
divisor  of  a  and  b. 

In  a  unique  factorization  domain  the  definition  of  greatest  common  divisor 
immediately  extends  to  apply  to  n  nonzero  elements,  rather  than  just  two.  We 
readily  check  up  to  a  unit  that 

GCD(ai, . . . ,  an)  =  GCD(GCD(ni, . . . ,  an-i),  an ). 

Moreover,  we  can  allow  any  of  <22, . . . ,  an  to  be  0,  and  there  is  no  difficulty.  In 
addition,  we  have 

GCD(dai ,  . . . ,  dan )  =  d  GCD(ni , . . . ,  an)  up  to  a  unit 
if  d  and  a  1  are  not  0. 

Let  R  be  a  unique  factorization  domain.  If  A(X)  is  a  nonzero  element  of  /?[  A- 1, 
we  say  that  A(X)  is  primitive  if  the  GCD  of  its  coefficients  is  a  unit.  In  this  case 
no  prime  of  R  divides  all  the  coefficients  of  A(X). 


7Greatest  common  divisors  can  exist  for  certain  integral  domains  that  fail  to  have  unique  factor¬ 
ization,  but  we  shall  not  have  occasion  to  work  with  any  such  domains. 
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Theorem  8.18  (Gauss’s  Lemma).  If  R  is  a  unique  factorization  domain,  then 
the  product  of  primitive  polynomials  is  primitive. 

PROOF  #1.  Arguing  by  contradiction,  let  A{X)  =  amXm  +  •  •  •  +  ao  and 
B(X)  =  bn  X"  +  •  •  •  +  bo  be  primitive  polynomials  such  that  every  coefficient  of 
A(X)B(X)  is  divisible  by  some  prime  p.  Since  A(X)  and  B(X)  are  primitive, 
we  may  choose  k  and  1  as  small  as  possible  such  that  p  does  not  divide  at  and 
does  not  divide  b y.  The  coefficient  of  Xk+I  in  A(X)B(X)  is 

Qobk+i  +  aibk+i-i  +  •  •  •  +  a/fbi  +  •  •  •  +  ak+ibo 

and  is  divisible  by  p.  Then  all  the  individual  terms,  and  their  sum,  are  divisible 
by  p  except  possibly  for  (p  bi ,  and  we  conclude  that  p  divides  a*  bi .  Since  p  is 
prime  and  p  divides  ip- hi ,  p  must  divide  (p  or  /?/,  contradiction.  □ 

PROOF  #2.  Arguing  by  contradiction,  let  A{X)  and  B(X)  be  primitive  poly¬ 
nomials  such  that  every  coefficient  of  A(X)B(X)  is  divisible  by  some  prime 
p.  Proposition  8.14  shows  that  the  ideal  ( p)  is  prime,  and  Proposition  8.7 
shows  that  R'  =  R/(p )  is  an  integral  domain.  Let  <p  :  R  — >  R  \X\  be  the 
composition  of  the  quotient  homomorphism  R  — »■  R'  and  the  inclusion  of  R’  into 
constant  polynomials  in  /? '[AT],  and  let  <b  :  /?[A]  — »■  7?'[X]  be  the  corresponding 
substitution  homomorphism  of  Proposition  4.24  that  carries  X  to  X.  Since  A{X) 
and  B(X)  are  primitive,  <h(A(X))  and  <t> ( B(X))  are  not  zero.  Their  product 
O (A(X))d>(5(X))  =  d>(A(Z)fi(X))  is  0  since  p  divides  every  coefficient  of 
A{X)B(X),  and  this  conclusion  contradicts  the  assertion  of  Proposition  4.29  that 
R' [A]  is  an  integral  domain.  □ 

Let  F  be  the  field  of  fractions  of  the  unique  factorization  domain  R.  The 
consequences  of  Theorem  8.18  exploit  a  simple  relationship  between  7?[X]  and 
F[X],  which  we  state  below  as  Proposition  8.19.  Once  that  proposition  is  in  hand, 
we  can  state  the  consequences  of  Theorem  8. 18.  If  A(X)  is  a  nonzero  polynomial 
in  /?[X],  let  c(A)  to  be  the  greatest  common  divisor  of  the  coefficients,  i.e., 

c(A)  =  GCD(a„,  ...,ai,  no)  if  A(X)  =  anXn  +  •  •  •  +  a\X  +  ao- 

The  element  c(A)  is  well  defined  up  to  a  factor  of  a  unit.  In  this  notation  the 
definition  of  "primitive”  becomes,  A(X)  is  primitive  if  and  only  if  c(A)  is  a  unit. 
We  shall  make  computations  with  c(A)  as  if  it  were  a  member  of  R.  in  order  to 
keep  the  notation  simple.  To  be  completely  rigorous,  one  should  regard  c(A)  as 
an  orbit  of  the  group  R  x  of  units  in  R.  using  equality  to  refer  to  equality  of  orbits. 

If  A{X)  is  not  necessarily  primitive,  then  at  least  c(A)  divides  each  coefficient 
of  A  (A),  and  hence  c(A)-1  A(X)  is  in  /?[X],  say  with  coefficients  b„, . . . ,  b\,  bo- 
Then  we  have 

c(A)  =  GCD(a„,  ...,auao)  =  GCD  (c(A)bn, c(A)bu  c(A)b0) 

=  c(A)GCD(h„,  ...,b\-bo)=  c(A)c(c(A)“1  A(X)) 
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up  to  a  unit  factor,  and  hence  c(c(A)  1  A(X))  is  a  unit.  We  conclude  that 
A(X)  e  implies  that  c(A)~lA(X)  is  primitive. 


Proposition  8.19.  Let  R  be  a  unique  factorization  domain,  and  let  F  be  its 
field  of  fractions.  If  A{X )  is  any  nonzero  polynomial  in  F[X],  then  there  exist  a 
in  F  and  Aq(X)  in  R\X\  such  that  A(X)  =  a  At}(X)  with  Aq(X)  primitive.  The 
scalar  a  and  the  polynomial  Ao(X)  are  unique  up  to  multiplication  by  units  in  R. 

Remark.  We  call  Ao(X)  the  associated  primitive  polynomial  to  A(X). 
According  to  the  proposition,  it  is  unique  up  to  a  unit  factor  in  R. 

Proof.  Let  A  ( X )  =  cn  X"  +  •  •  •  +  c\  X  +  co  with  each  eg  in  F .  We  can  write 
each  Ck  as  at bp 1  with  at  and  bk  in  R  and  bt  ^  0.  We  clear  fractions.  That  is, 
we  let  p  =  nLo  bk-  Then  the  k'h  coefficient  of  p  A  (X )  is  at  \\i^k  h  and  is  in 
R.  Hence  p  A(X)  is  in  R[X\.  The  observation  just  before  the  proposition  shows 
that  c(/3 A)-1  p  A  is  primitive.  Thus  A(X)  =  aAo(X)  with  a  =  p~lc(PA )  and 
At)(X)  =  c(PA)~lpA(X),  Aq(X)  being  primitive.  This  proves  existence. 

If  aiAi(X)  =  q!2A2(X)  with  oq  and  012  in  F  and  with  Ai(X)  and  A2(X) 
primitive,  choose  r  ^  0  in  R  such  that  rot  1  and  rot 2  are  in  R.  Up  to  unit  factors  in 
R ,  we  then  have  ra \  =  raic(Ai)  =  c{rot\A\)  =  c(ra2A2>  =  rci2c(A2 )  =  rco. 
Hence,  up  to  a  unit  factor  in  R,  we  have  ot\  =  0,2-  This  proves  uniqueness.  □ 

Corollary  8.20.  Let  R  be  a  unique  factorization  domain,  and  let  F  be  its  field 
of  fractions. 

(a)  Let  A(X)  and  B(X)  be  nonzero  polynomials  in  R[X],  and  suppose  that 
B(X)  is  primitive.  If  B(X)  divides  A(X)  in  F[X],  then  it  divides  A(X)  in  R[X]. 

(b)  If  A(X)  is  an  irreducible  polynomial  in  R\X\  of  degree  >  0,  then  A ( X )  is 
irreducible  in  F[X], 

(c)  If  A(X)  is  a  monic  polynomial  in  R[X]  and  if  B(X)  is  a  monic  factor  of 
A(X)  within  F[X],  then  B(X)  is  in  R[X], 

(d)  If  A(X),  B(X),  and  C(X)  are  in  R[X]  with  A(X)  primitive  and  with 
A(X)  =  B(X)C (X ),  then  B(X )  and  C(X)  are  primitive. 

PROOF.  In  (a),  write  A(X)  =  5(X)e(X)inF(X),andlet  Q(X)  =  pQ0(X)  be 
a  decomposition  of  Q(X)  as  in  Proposition  8. 19.  Since  c(A)~' A(X)  is  primitive, 
the  corresponding  decomposition  of  A(X)  is  A(X)  =  c(A)(c(A)“1  A(X)).  The 
equality  A(X)  =  pB(X)Q0(X)  thenreadscfAjjcfA)-1  A(X))  =  pB(X)Q0(X). 
Since  B(X)Qo(X)  is  primitive  according  to  Theorem  8.18,  the  uniqueness  in 
Proposition  8.19  shows  that  c(A)-1A(X)  =  B{X)Qo(X)  except  possibly  for  a 
unit  factor  in  R.  Then  B(X)  divides  A(X)  with  quotient  c(A)Qo(X),  apart  from 
a  unit  factor  in  R.  Since  c( A)  Qo(X)  is  in  R[X],  (a)  is  proved. 
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In  (b),  the  condition  that  deg  A(X)  >  0  implies  that  A(X)  is  not  a  unit  in 
F[X].  Arguing  by  contradiction,  suppose  that  A{X)  =  B{X)Q(X)  in  F[X]  with 
neither  of  B(X)  and  Q(X)  of  degree  0.  Let  B(X)  =  fi  Bq ( X )  be  a  decomposition 
of  B(X )  as  in  Proposition  8.19.  Then  we  have  A(X)  =  Bq{X)(PQ(X)),  and  (a) 
shows  that  /3Q(X)  is  in  R\X\,  in  contradiction  to  the  assumed  irreducibility  of 
A(X)  inR[X]. 

In(c),  write  A(X)  =  B(X)Q(X),  and  let  B(X)  =  /3Bq(X)  be  a  decomposition 
of  B(X)  as  in  Proposition  8.19.  Then  we  have  A(X)  =  B0(X)(PQ(X ))  with 
/3Q(X)  in  F[X ].  Conclusion  (a)  shows  that  PQ(X)  is  in  /?[X].  If  b  e  R  is  the 
leading  coefficient  of  Bo  (X)  and  if  q  e  R  is  the  leading  coefficient  of  PQ{X),  then 
we  have  1  =  bq,  and  consequently  b  and  q  are  units  in  R.  Since  B(X)  =  fJ>  Bq(X) 
and  B(X)  is  monic,  1  =  /ffi,  and  therefore  ft  =  b~]  is  a  unit  in  R.  Hence  B{X) 
is  in  R[X]. 

In  (d),  we  argue  along  the  same  lines  as  in  (a).  We  may  take  B(X)  = 
c{B)(c{B)~l B(X))  and  C(X )  =  c(C)(c(C)-1C(X))  as  decompositions  of 
B(X)  and  C (X )  according  to  Proposition  8.19.  Then  we  have  A(X )  = 
(c(5)c(C))[c(5)_15(Z)c(C)_1C(Z)].  Theorem  8.18  says  that  the  factor  in 
brackets  is  primitive,  and  the  uniqueness  in  Proposition  8.19  shows  that  1  = 
c(B)c(C),  up  to  unit  factors.  Therefore  c ( B )  and  c(C)  are  units  in  R,  and  B(X) 
and  C(X)  are  primitive.  □ 

Corollary  8.21.  If  R  is  a  unique  factorization  domain,  then  the  ring  is 
a  unique  factorization  domain. 

Remark.  As  was  mentioned  at  the  beginning  of  the  section,  Z[  X\  and 
K[Xi, . . . ,  X„\,  when  IK  is  a  field,  are  unique  factorization  domains  as  a  con¬ 
sequence  of  this  result. 

PROOF.  We  begin  with  the  proof  of  (UFD1).  Suppose  that  A{X)  is  a  nonzero 
member  of  R\X\.  We  may  take  its  decomposition  according  to  Proposition  8.19 
to  be  A(X)  =  c(A)(c(A)-1  A(Z)).  Consider  divisors  of  c{A)~x  A(X)  in  R[X]. 
These  are  all  primitive,  according  to  (d).  Hence  those  of  degree  0  are  units 
in  R.  Thus  any  nontrivial  factorization  of  c(A)-1A(X)  is  into  two  factors  of 
strictly  lower  degree,  both  primitive.  In  a  finite  number  of  steps,  this  process  of 
factorization  with  primitive  factors  has  to  stop.  We  can  then  factor  c(  A)  within  R. 
Combining  the  factorizations  of  c(A)  and  c(A)-1  A(X),  we  obtain  a  factorization 
of  A{X). 

For  (UFD2'),  let  P(X)  be  irreducible  in  A* [  AT  | .  Since  the  factorization  P(X)  = 
c(P)(c(P)~l  P(X))  has  to  be  trivial,  either  c(P)  is  a  unit,  in  which  case  P(X)  is 
primitive,  or  c(P)~i  P(X)  is  a  unit,  in  which  case  P(X)  has  degree  0.  In  either 
case,  suppose  that  P(X)  divides  a  product  A(X ) B(X). 

In  the  first  case,  P(X)  is  primitive.  Since  F\X\  is  a  principal  ideal  domain, 
hence  a  unique  factorization  domain,  either  P(X)  divides  A(X)  in  F\X\  or  P(X) 


398 


VIII.  Commutative  Rings  and  Their  Modules 


divides  B{X)  in  F\X\.  By  symmetry  we  may  assume  that  P ( X )  divides  A(X) 
in  F[X ].  Then  (a)  shows  that  P(X)  divides  A(X)  in  R[X], 

In  the  second  case,  P(X)  =  P  has  degree  0  and  is  prime  in  R.  Put  R'  =  R(P ) 
as  in  Proof  #2  of  Theorem  8.18.  Then  A{X)B(X)  maps  to  zero  in  the  integral 
domain  and  hence  A(X)  or  B(X)  is  in  P  /?[X],  □ 

The  final  application,  Eisenstein’s  irreducibility  criterion,  is  proved  somewhat 
in  the  style  of  Gauss’s  Lemma  (Theorem  8.18).  We  shall  give  only  the  analog  of 
Proof  #1  of  Gauss’s  Lemma,  leaving  the  analog  of  Proof  #2  to  Problem  21  at  the 
end  of  the  chapter. 

Corollary  8.22  (Eisenstein’s  irreducibility  criterion).  Let  R  be  a  unique  fac¬ 
torization  domain,  let  F  be  its  field  of  fractions,  and  let  p  be  a  prime  in  R.  If 
A{X)  =  a^XN  +  •  •  •  +  a\X  +  ao  is  a  polynomial  of  degree  >  1  in  R[X]  such 
that  p  divides  a^~ 1,  ■ .  ■ ,  flo  but  not  a,\  and  such  that  p2  does  not  divide  ao,  then 
A{X)  is  irreducible  in  F[X], 

Remark.  The  polynomial  A(X)  will  be  irreducible  in  R\X\  also  unless  all  its 
coefficients  are  divisible  by  some  nonunit  of  R. 

PROOF.  Without  loss  of  generality,  we  may  replace  A(X)  by  c(A)~]  A(X) 
and  thereby  assume  that  A{X)  is  primitive;  this  adjustment  makes  use  of  the 
hypothesis  that  p  does  not  divide  a^.  Corollary  8.20b  shows  that  it  is  enough  to 
prove  irreducibility  in  R[X],  Assuming  the  contrary,  suppose  that  A(X)  factors 

in  R[X ]  as  A(X)  =  B(X)C(X )  with  B(X)  =  bmXm  -\ - kb^X  +  bo,  C(X )  = 

c„X"  +  •  •  •  +  c\X  +  co,  and  neither  of  B(X)  and  C(X)  equal  to  aunit.  Corollary 
8.20d  shows  that  B(X)  and  C(X)  are  primitive.  In  particular,  B(X)  and  C(X) 
have  to  be  nonconstant  polynomials.  Defines  =  0  for  k  >  N  ,bk  =  0  for  k  >  m, 
and  Ck  =  0  for  k  >  n.  Since  p  divides  ao  =  boco  and  p  is  prime,  p  divides  either 
bo  or  co-  Without  loss  of  generality,  suppose  that  p  divides  /;().  Since  p2  does  not 
divide  ao,  p  does  not  divide  Q). 

We  show,  by  induction  on  k,  that  p  divides  /y  for  every  k  <  N.  The  case 
k  =  0  is  the  base  case  of  the  induction.  If  p  divides  Ip  for  j  <  k,  then  we  have 

ak  =  bock  +  biCk-i  H - b  bk-ici  +  bkc0. 

Since  k  <  N,  the  left  side  is  divisible  by  p.  The  inductive  hypothesis  shows 
that  p  divides  every  term  on  the  right  side  except  possibly  the  last.  Consequently 
p  divides  bk Co .  Since  p  does  not  divide  co,  p  divides  bk.  This  completes  the 
induction. 

Since  C(X)  is  nonconstant,  the  degree  of  B(X)  is  <  N ,  and  therefore  we  have 
shown  that  every  coefficient  of  B(X)  is  divisible  by  p.  Then  c( B )  is  divisible  by 
p,  in  contradiction  to  the  fact  that  B(X)  is  primitive.  □ 
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Examples. 

(1)  Cyclotomic  polynomials  in  Q[7].  Let  us  see  for  each  prime  number  p  that 

the  polynomial  4>(7)  =  Xp~l  +  Xp~ 2  +  •  •  •  +  X  +  1  is  irreducible  in  Q[7]. 
We  have  Xp  —  1  =  (X  —  1)0 (X).  Replacing  X  —  1  by  Y  gives  (7  +  Y)p  —  1  = 
7<1>(7  +  1).  The  left  side,  by  the  Binomial  Theorem,  is  Ylk=  l  (k)  F*.  Hence 
0(7+1)  =  Ylk=\  (k)  ■  The  binomial  coefficient  (pj  is  divisible  by  p 

for  1  <  k  <  p  —  1  since  p  is  prime,  and  therefore  the  polynomial  0(7)  = 
0(7  +  1)  satisfies  the  condition  of  Corollary  8.22  for  the  ring  Z.  Hence  O  (7)  is 
irreducible  over  Q[7],  A  nontrivial  factorization  of  0(7)  would  yield  a  nontrivial 
factorization  of  O (7),  and  hence  0(7)  is  irreducible  over  Q[7], 

(2)  Certain  polynomials  in  K[7,  7]  when  IK  is  a  field.  Since  K[ X,  7]  = 
K[7][7],  it  follows  that  K[7,  7]  is  a  unique  factorization  domain,  and  any  mem¬ 
ber  of  K[7,  7]  can  be  written  as  A(7,  7)  =  an{X)Yn  +  •  •  •  +  a\(X)Y  +  «o(F). 
The  polynomial  X  is  prime  in  K[7,  7],  and  Corollary  8.22  therefore  says  that 
A(X,  7)  is  irreducible  in  K(7)[ 7|  if  X  does  not  divide  an  (7)  in  K[7],  7  divides 
an- 1(7), . . . ,  ao(7)  inK[7],  and  72  does  not  divide  ao{X)  inK[7],  The  remark 
with  the  corollary  points  out  that  A(X,  7)  is  irreducible  in  K[7,  7]  if  also  there 
is  no  nonconstant  polynomial  in  K[7]  that  divides  every  or  (7).  For  example, 
75  +  7 72  +  77  +  7  is  irreducible  in  K[7,  7], 
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The  Fundamental  Theorem  of  Finitely  Generated  Abelian  Groups  (Theorem  4.56) 
says  that  every  finitely  generated  abelian  group  is  a  direct  sum  of  cyclic  groups. 
If  we  think  of  abelian  groups  as  Z  modules,  we  can  ask  whether  this  theorem 
has  some  analog  in  the  context  of  R  modules.  The  answer  is  yes— the  theorem 
readily  extends  to  the  case  that  Z  is  replaced  by  an  arbitrary  principal  ideal  domain. 
The  surprising  addendum  to  the  answer  is  that  we  have  already  treated  a  second 
special  case  of  the  generalized  theorem.  That  case  arises  when  the  principal  ideal 
domain  is  K[7]  for  some  field  IK.  If  V  is  a  finite-dimensional  vector  space  over 
IK  and  L  :  V  —>■  V  is  a  IK  linear  map,  then  V  becomes  a  K[7]  module  under 
the  definition  Xv  =  L(v).  This  module  is  finitely  generated  even  without  the 
7  present  because  V  is  finite-dimensional,  and  the  generalized  theorem  that  we 
prove  in  this  section  recovers  the  analysis  of  L  that  we  carried  out  in  Chapter  V. 
When  IK  is  algebraically  closed,  we  obtain  the  Jordan  canonical  form;  for  general 
IK,  we  obtain  a  different  canonical  form  involving  cyclic  subspaces  that  was 
worked  out  in  Problems  32-40  at  the  end  of  Chapter  V. 

The  definitions  for  the  generalization  of  Theorem  4.56  are  as  follows.  Let 
R  be  a  principal  ideal  domain.  A  subset  S  of  an  R  module  M  is  called  a  set 
of  generators  of  M  if  M  is  the  smallest  R  submodule  of  M  containing  all  the 
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members  of  S.  If  {ms  [  j  e  S)  is  a  subset  of  M,  then  the  set  of  all  finite 
sums  rsms  is  an  R  submodule,  but  it  need  not  contain  the  elements  ms  and 
therefore  need  not  be  the  R  submodule  generated  by  all  the  ms .  However,  if  M 
is  unital,  then  taking  rSo  =  1  and  all  other  rs  equal  to  0  exhibits  mSQ  as  in  the  R 
submodule  of  all  finite  sums  YlseS  rsms ■  For  this  reason  we  shall  insist  that  all 
the  R  submodules  in  this  section  be  unital. 

We  say  that  the  R  module  M  is  finitely  generated  if  it  has  a  finite  set  of 
generators.  The  main  theorem  gives  the  structure  of  unital  finitely  generated  R 
modules  when  R  is  a  principal  ideal  domain.  We  need  to  take  a  small  preliminary 
step  that  eliminates  technical  complications  from  the  discussion,  the  same  step 
that  was  carried  out  in  Lemma  4.51  and  Proposition  4.52  in  the  case  of  Z  modules, 
i.e.,  abelian  groups. 

Lemma  8.23.  Let  R  be  a  commutative  ring  with  identity,  and  let  tp  :  M  — >•  N 
be  a  homomorphism  of  unital  R  modules.  If  ker<p  and  image  <p  are  finitely 
generated,  then  M  is  finitely  generated. 

PROOF.  Let  {xj , . . . ,  xm }  and  { yi , . . . ,  yn }  be  respective  finite  sets  of  generators 
for  ker  cp  and  image  <p .  For  1  <  j  <  n ,  choose  xj  in  M  with  cp  (xj )  =  yj .  We  shall 
prove  that  {x\ ,  . . . ,  xm,  x\ , . . . ,  x'n }  is  a  set  of  generators  for  M.  Thus  letx  be  in  M. 
Since  cp{x)  is  in  image  <p,  there  exist  n, . . . ,  rn  in  R  withip(x)  =  riyi  +  -  •  -+rn y„. 
The  elementx'  =  r\x[  +  •  •  •  +rnx'n  of  M  has  (p(x')  =  r\y\  +  •  •  •  +rnyn  =  < p(x ). 
Therefore  <p(x  —  x')  =  0,  and  there  exist  si, . . . ,  sm  in  R  such  that  x  —  x'  = 
SiXi  +  •  •  •  +  smxm .  Consequently 

x  =  sixi  H - b  smxm  +  x'  =  SiXi  H - b  smxm  +  r\x[  H - b  rnx'n.  □ 

Proposition  8.24.  If  R  is  a  principal  ideal  domain,  then  any  R  submodule 
of  a  finitely  generated  unital  R  module  is  finitely  generated.  Moreover,  any  R 
submodule  of  a  singly  generated  unital  R  module  is  singly  generated. 

Remark.  The  proof  will  show  that  if  M  can  be  generated  by  n  elements,  then 
so  can  the  unital  R  submodule. 

PROOF.  Let  M  be  unital  and  finitely  generated  with  a  set  {mi, . . . ,  mn}  of  n 
generators,  and  define  Mk  =  Rm\  +  •  •  •  +  Rm^  for  1  <  k  <  n.  Then  Mn  =  M 
since  M  is  unital.  We  shall  prove  by  induction  on  k  that  every  R  submodule  of 
is  finitely  generated.  The  case  k  =  n  then  gives  the  proposition.  For  k  =  I . 
suppose  that  S  is  an  R  submodule  of  M\  =  Rm  \ .  Since  S  is  an  R  submodule 
and  every  member  of  S  lies  in  Rni\,  the  subset  /  of  all  r  in  R  with  rm  |  in  S  is 
an  ideal  with  7m  i  =  S.  Since  every  ideal  in  R  is  singly  generated,  we  can  write 
]  =  (/•()).  Then  S  =  lm\  =  Rro>n\,  and  the  single  clement  ri}m  i  generates  S. 

Assume  inductively  that  every  R  submodule  of  M/;  is  known  to  be  finitely 
generated,  and  let  A^+i  be  an  R  submodule  of  Mi+\.  Let  q  :  Mi+\  ->  Mi+i/Mi 
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be  the  quotient  R  homomorphism,  and  let  tp  be  the  restriction  q  |  ^  ,  mapping 
/V/.+ 1  into  M/<+]  /  Then  ker  cp  =  N^+\  fl  M*  is  an  R  submodule  of  and  is 
finitely  generated  by  the  inductive  hypothesis.  Also,  image  (p  is  an  R  submodule 
of  Mk+i/Mk,  which  is  singly  generated  with  generator  equal  to  the  coset  of 
mjt+i.  Since  an  R  submodule  of  a  singly  generated  unital  R  module  was  shown 
in  the  previous  paragraph  to  be  singly  generated,  image  tp  is  finitely  generated. 
Applying  Lemma  8 . 23  to  tp ,  we  see  that  IV* + 1  is  finitely  generated.  This  completes 
the  induction  and  the  proof.  □ 

According  to  the  definition  in  Example  9  of  modules  in  Section  1,  a  free  R 
module  is  a  direct  sum,  finite  or  infinite,  of  copies  of  the  R  module  R.  A  free  R 
module  is  said  to  have  finite  rank  if  some  direct  sum  is  a  finite  direct  sum.  A 
unital  R  module  M  is  said  to  be  cyclic  if  it  is  singly  generated,  i.e.,  if  M  =  Rnio 
for  some  mo  in  M.  In  this  case,  we  have  an  R  isomorphism  M  =  R/I,  where  / 
is  the  ideal  {r  e  R  \  rmo  =  0}. 

Before  coming  to  the  statement  of  the  theorem  and  the  proof,  let  us  discuss 
the  heart  of  the  matter,  which  is  related  to  row  reduction  of  matrices.  We  regard 
the  space  M\n{R )  of  all  1-row  matrices  with  n  entries  in  R  as  a  free  R  module. 
Suppose  that  R  is  a  principal  ideal  domain,  and  suppose  that  we  have  a  particular 
2- by-/;  matrix  with  entries  in  R  and  with  the  property  that  the  two  rows  have 
nonzero  elements  a  and  b,  respectively,  in  the  first  column.  We  can  regard 
the  set  of  R  linear  combinations  of  the  two  rows  of  our  particular  matrix  as 
an  R  submodule  of  the  free  R  module  M\n{R).  Let  c  =  GCD(a,  b).  This 
member  of  R  is  defined  only  up  to  multiplication  by  a  unit,  but  we  make  a 
definite  choice  of  it.  The  idea  is  that  we  can  do  a  kind  of  invertible  row-reduction 
step  that  simultaneously  replaces  the  two  rows  of  our  2-by -n  matrix  by  a  first  row 
whose  first  entry  is  c  and  a  second  row  whose  first  entry  is  0;  in  the  process  the 
corresponding  R  submodule  of  M\n  ( R)  will  be  unchanged.  In  fact,  we  saw  in  the 
previous  section  that  the  hypothesis  on  R  implies  that  there  exist  members  x  and 
y  of  R  with  xa  +yb  =  c.  Since  c  divides  a  and  b,  we  can  rewrite  this  equality  as 
x(ac-1)  +  y{bc~l)  =  1.  Then  the  2-by-2  matrix  M  =  ( '-i )  with  entries 
in  R  has  the  property  that 

/  x  y  \  fa  _  ( c 

y  —  bc~l  ac~l )  \b  \  0  *)' 

This  equation  shows  explicitly  that  the  rows  of  ( LQ  * )  lie  in  the  R  linear  span  of  the 
rows  of  ( ”  * ).  The  key  fact  about  M  is  that  its  determinant  x(ac~ 1 )  +  y(bc~l) 
is  1  and  that  M  is  therefore  invertible  with  entries  in  R:  the  inverse  is  just 
M~l  =  ^ ~y).  This  invertibility  shows  that  the  rows  of  (”  *)  lie  in  the  R 

linear  span  of  (  q  * )  •  Consequently  the  R  linear  span  of  the  rows  of  our  given 
2-by-/;  matrix  is  preserved  under  left  multiplication  by  M. 
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In  effect  we  can  do  the  same  kind  of  row  reduction  of  matrices  over  R  as  we 
did  with  matrices  over  Z  in  the  proof  of  Theorem  4.56.  The  only  difference  is 
that  this  time  we  do  not  see  constructively  how  to  find  the  x  and  y  that  relate  a, 
b,  and  c.  Thus  we  would  lack  some  information  if  we  actually  wanted  to  follow 
through  and  calculate  a  particular  example.  We  were  able  to  make  calculations 
to  imitate  the  proof  of  Theorem  4.56  because  we  were  able  to  use  the  Euclidean 
algorithm  to  arrive  at  what  x  and  y  are.  In  the  present  context  we  would  be  able 
to  make  explicit  calculations  if  R  were  a  Euclidean  domain. 


Theorem  8.25  (Fundamental  Theorem  of  Finitely  Generated  Modules).  If  R 
is  a  principal  ideal  domain,  then 

(a)  the  number  of  R  summands  in  a  free  R  module  of  finite  rank  is  independent 
of  the  direct-sum  decomposition, 

(b)  any  R  submodule  of  a  free  R  module  of  finite  rank  n  is  a  free  R  module 
of  rank  <  n, 

(c)  any  finitely  generated  unital  R  module  is  the  finite  direct  sum  of  cyclic 
modules. 

Remark.  Because  of  (a),  it  is  meaningful  to  speak  of  the  rank  of  a  free 
R  module  of  finite  rank;  it  is  the  number  of  R  summands.  By  convention 
the  0  module  is  a  free  R  module  of  rank  0.  Then  the  statement  of  (b)  makes 
sense.  Statement  (c)  will  be  amplified  in  Corollary  8.29  below.  Some  people 
use  the  name  “Fundamental  Theorem  of  Finitely  Generated  Modules”  to  refer  to 
Corollary  8.29  rather  than  to  Theorem  8.25. 

Proof.  Let  F  be  a  free  R  module  of  the  form  Rx\  ©  •  •  •  ©  Rxn,  and 
suppose  that  y\, ... ,  ym  are  elements  of  F  such  that  no  nontrivial  combination 
r\y\  +  •  •  •  +  rmym  is  0.  We  argue  as  in  the  proof  of  Proposition  2.2.  Define  an 
m-by-n  matrix  C  with  entries  in  R  by  y,  =  J2'/=  l  Cijxj  f°r  1  <  i  <  m.  If  Q  is 
the  field  of  fractions  of  R ,  then  we  can  regard  C  as  a  matrix  with  entries  in  Q.  As 
such,  the  matrix  has  rank  <  n.  If  m  >  n,  then  the  rows  are  linearly  dependent, 
and  we  can  find  members  q\, . . . ,  qm  of  Q.  not  all  0,  such  that  Yl'iLi  QiCij  =  0 
for  1  <  /  <  n.  Clearing  fractions,  we  obtain  members  r\, . . . ,  rm  of  R,  not  all  0, 
such  that  Y17=  i  r<  Cij  =  0  for  1  <  j  <  n.  Then 


E  nyi  =  E  f 


;=t 


i= l  D'= i 


n  .  n  /  m  \ 

E  Cijxj)  =  E  (  E  r'i  Cij ) 

;= 1  /  j=\  '  i=l  / 


=  E  0  xj  =  0, 

j= 1 


in  contradiction  to  the  assumed  independence  property  of  yi , . . . ,  ym .  Therefore 
we  must  have  m  <  n. 

If  we  apply  this  conclusion  to  a  set  xi, ... ,  xn  that  exhibits  F  as  free  and  to 
another  set,  possibly  infinite,  that  does  the  same  thing,  we  find  that  the  second 
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set  has  <  n  members.  Reversing  the  roles  of  the  two  sets,  we  find  that  they  both 
have  n  members.  This  proves  (a). 

For  (b)  and  (c),  we  shall  reduce  the  result  to  a  lemma  saying  that  a  certain  kind 
of  result  can  be  achieved  by  row  and  column  reduction  of  matrices  with  entries  in 
R.  Let  F  be  a  free  R  module  of  rank  «,  defined  by  a  subset  x\,  ...  ,xn  of  F,  and  let 
M  be  an  R  submodule  of  F.  Proposition  8.24  shows  that  M  is  finitely  generated. 
We  let  ym  be  generators,  not  necessarily  with  any  independence  property. 

Define  an  m-by-n  matrix  C  with  entries  in  R  by  y,  =  Y^]=\  Cijxj  ■  We  can  recover 
F  as  the  set  of  R  linear  combinations  of  x\ ,  . . . ,  x„ ,  and  we  can  recover  M  as  the 
set  of  R  linear  combinations  of  y  i , . . . ,  ym . 

If  B  is  an  n-by-n  matrix  with  entries  in  R  and  with  determinant  in  the  group 
Rx  of  units,  then  Corollary  5.5  shows  that  B~l  exists  and  has  entries  in  R.  If 
we  define  x-  =  ]T"=1  Bi jxj>  then  any  R  linear  combination  of  x[,  ...  ,  x'n  is  an 
R  linear  combination  of  x\, . . . ,  xn.  Also,  the  computation  ^"=1(fi_1)foX-  = 
■(B~1)kiBjjXj  =  YljSkjXj  =  Xk  shows  that  any  R  linear  combination  of 
x\.  ...  ,xn  is  an  R  linear  combination  of  x[, ... ,  x'n.  Thus  we  can  recover  the 
same  F  and  M  if  we  replace  C  by  C  B .  Arguing  in  the  same  way  with  y\, ... ,  ym 
and  y[, ... ,  y'm ,  we  see  that  we  can  recover  the  same  F  and  M  if  we  replace  C B 
by  ACB ,  where  A  is  an  m-by-m  matrix  with  entries  in  R  and  with  determinant 
in  Rx. 

Lemma  8.26  below  will  say  that  we  can  find  A  and  B  such  that  the  nonzero 
entries  of  D  =  AC  B  are  exactly  the  diagonal  ones  Dkk  for  1  <  k  <  /,  where  l  is 
a  certain  integer  with  0  <  1  <  min (m,  n). 

That  is,  the  resulting  equations  restricting  y\, ... ,  y'm  in  terms  of  x[, ... ,  x'n 
will  be  of  the  form 

,  f  Dkkx'k  for  I  <  k  <  l. 

yk  ~  I  0  for  /  +  1  <k<  m . 

Now  let  us  turn  to  (b)  and  (c).  For  (b),  the  claim  is  that  the  elements  y'k  with 
1  <  k  <  1  exhibit  M  as  a  free  R  module.  We  know  that  y\ ,  . . . ,  y'm  generate  M 
and  hence  that  y'x , . . . ,  yj  generate  M.  For  the  independence,  suppose  we  can  find 
members  r\, ...  ,r\  not  all  0  in  R  such  that  A  =  0-  Then  substitution 

gives  Yli=i  >'k  Dkkx'k  =  0,  and  the  independence  of  x\ , . . . ,  x[  forces  rk  Dkk  =  0 
for  1  <  k  <  1.  Since  R  is  an  integral  domain,  rk  =  0  for  such  k.  Thus  indeed  the 
elements  y'k  with  1  <  k  <  I  exhibit  M  as  a  free  R  module.  Since  I  <  m i n ( m ,  n), 
the  rank  of  M  is  at  most  the  rank  of  F. 

For  (c),  let  S  be  a  finitely  generated  unital  R  module,  say  with  n  generators. 
By  the  universal  mapping  property  of  free  R  modules  (Example  9  in  Section  1), 
there  exists  a  free  R  module  F  of  rank  n  with  S  as  quotient.  Let  x\, ...  ,xn  be 
generators  of  F  that  exhibit  F  as  free,  and  let  M  be  the  kernel  of  the  quotient  R 
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homomorphism  M  — »■  S,  so  that  S  =  F/M.  Then  (b)  shows  that  M  is  a  free 
R  module  of  rank  m  <  n.  Let  ji, ...  ,ym  be  generators  of  M  that  exhibit  M 
as  free,  and  define  an  m-by-n  matrix  C  with  entries  in  R  by  y,  =  J2'/= 1  Cijxj 
for  1  <  i  <  m.  The  result  is  that  we  are  reduced  to  the  situation  we  have  just 
considered,  and  we  can  obtain  equations  of  the  form  (*)  relating  their  respective 
generators,  namely  y[, ... ,  y'm  for  M  and  x[, ... ,  x'n  for  F. 

For  1  <  k  <  n,  define  Fk  =  Rx'k  and 


_  |  Ry'k  =  RDkkx'k  for  1  <  k  <  /, 
k~[0  for  /  +  1  <  A  <  n, 

so  that  M  =  M\  ©  •  •  •  ©  Mn .  Then  Fk/Mk  is  R  isomorphic  to  the  cyclic  R  module 
R/(Dkk)  if  1  <  A  <  /,  while  Fk/Mk  =  Fk  is  isomorphic  to  the  cyclic  R  module 
R  if  /  +  1  <  A  <  n.  Applying  Proposition  8.5,  we  obtain 


F/M  =  (Fj  ®  •  •  •  ©  Fn)/(Ml  ©  •  •  •  ©  Mn)  =  (Fi/Mi)  ©  •  •  •  ©  (FJMn). 


Thus  F/M  is  exhibited  as  a  direct  sum  of  cyclic  R  modules.  □ 

To  complete  the  proof  of  Theorem  8.25,  we  are  left  with  proving  the  following 
lemma,  which  is  where  row  and  column  reduction  take  place. 

Lemma  8.26.  Let  R  be  a  principal  ideal  domain.  If  C  is  an  m-by-n  matrix 
with  entries  in  R,  then  there  exist  an  m-by-m  matrix  A  with  entries  in  R  and 
with  determinant  in  Rx  and  an  n-by-n  matrix  B  with  entries  in  R  and  with 
determinant  in  Rx  such  that  for  some  /  with  0  <  /  <  m i n ( m ,  n),  the  nonzero 
entries  of  D  =  AC B  are  exactly  the  diagonal  entries  D\\,  D22. . . . ,  F>//- 

PROOF.  The  matrices  A  and  B  will  be  constructed  as  products  of  matrices  of 
determinant  ±1,  and  then  det  A  and  dct  B  equal  ±1  by  Proposition  5.1a.  The 
matrix  A  will  correspond  to  row  operations  on  C,  and  B  will  correspond  to 
column  operations.  Each  factor  will  be  the  identity  except  in  some  2-by-2  block. 
Among  the  row  and  column  operations  of  interest  are  the  interchange  of  two 
rows  or  two  columns,  in  which  the  2-by-2  block  is  ^).  Another  row  operation 

of  interest  replaces  two  rows  having  respective  jth  entries  a  and  b  by  R  linear 
combinations  of  them  in  which  a  and  b  are  replaced  by  c  =  GCD(a,  b )  and  0. 
If  x(oc_1)  +  y(bc~l)  =  1,  then  the  2-by-2  block  is  (_kc-i  (((!  1 ) .  A  similar 
operation  is  possible  with  columns. 

The  reduction  involves  an  induction  that  successively  constructs  the  entries 
D 11,  D22, ...,  Du,  stopping  when  the  part  of  C  involving  rows  and  columns 
numbered  >  /  +  1  has  been  replaced  by  0.  We  start  by  interchanging  rows  and 
columns  to  move  a  nonzero  entry  into  position  (1,  1).  By  a  succession  of  row 
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operations  as  in  the  previous  paragraph,  we  can  reduce  the  entry  in  position  (1,1) 
to  the  greatest  common  divisor  of  the  entries  of  C  in  the  first  column,  while 
reducing  the  remaining  entries  of  the  first  column  to  0.  Next  we  do  the  same 
thing  with  column  operations,  reducing  the  entry  in  position  (1,  1)  to  the  greatest 
common  divisor  of  the  members  of  the  first  row,  while  reducing  the  remaining 
entries  of  the  first  row  to  0.  Then  we  go  back  and  repeat  the  process  with  row 
operations  and  with  column  operations  as  many  times  as  necessary  until  all  the 
entries  of  the  first  row  and  column  other  than  the  one  in  position  (1 .  1)  are  0.  We 
need  to  check  that  this  process  indeed  terminates  at  some  point.  If  the  entries  that 
appear  in  position  ( 1 ,  1 )  as  the  iterations  proceed  are  ci ,  cj ,  C3 , . . . ,  then  we  have 
(ci)  C  (C2)  C  (C3)  c  ■  •  • .  The  union  of  these  ideals  is  an  ideal,  necessarily  a 
principal  ideal  of  the  form  (c),  and  c  occurs  in  one  of  the  ideals  in  the  union;  the 
chain  of  ideals  must  be  constant  after  that  stage.  Once  the  corner  entry  becomes 
constant,  the  matrices  J', )  for  the  row  operations  can  be  chosen  to  be 

of  the  form  ^  _^a-\  and  the  result  is  that  the  row  operations  do  not  change 
the  entries  of  the  first  row.  Similar  remarks  apply  to  the  matrices  for  the  column 
operations.  The  upshot  is  that  we  can  reduce  C  in  this  way  so  that  all  entries  of 
the  first  row  and  column  are  0  except  the  one  in  position  (1,  1).  This  handles 
the  inductive  step,  and  we  can  proceed  until  at  some  /th  stage  we  have  only  the  0 
matrix  to  process.  □ 

This  completes  the  proof  of  Theorem  8.25.  In  Theorem  4.56,  in  which  we 
considered  the  special  case  of  abelian  groups,  we  obtained  a  better  conclusion 
than  in  Theorem  8.25c:  we  showed  that  the  direct  sum  of  cyclic  groups  could 
be  written  as  the  direct  sum  of  copies  of  Z  and  of  cyclic  groups  of  prime-power 
order,  and  that  in  this  case  the  decomposition  was  unique  up  to  the  order  of  the 
summands.  We  shall  now  obtain  a  corresponding  better  conclusion  in  the  setting 
of  Theorem  8.25. 

The  existence  of  the  decomposition  into  cyclic  modules  of  a  special  kind  uses  a 
very  general  form  of  the  Chinese  Remainder  Theorem,  whose  classical  statement 
appears  as  Corollary  1.9.  The  generalization  below  makes  use  of  the  following 
operations  of  addition  and  multiplication  of  ideals  in  a  commutative  ring  with 
identity:  if  7  and  J  are  ideals,  then  1  +  J  denotes  the  set  of  sums  x  +  y  with 
x  e  I  and  y  e  J,  and  7/  denotes  the  set  of  all  finite  sums  of  products  xy  with 
x  £  I  and  ye/;  the  sets  I  +  J  and  I J  are  ideals. 

Theorem  8.27  (Chinese  Remainder  Theorem).  Let  R  be  a  commutative  ring 
with  identity,  and  let  7] , . . . ,  /„  be  ideals  in  R  such  that  7,  +  Ij  =  R  whenever 

i  +  j- 

(a)  If  elements  x\, . . . ,  xn  of  7?  are  given,  then  there  exists  x  in  R  such  that 
x  =  Xj  mod  Ij,  i.e.,  x  —  Xj  is  in  Ij,  for  all  j.  The  element  x  is  unique  if 
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h  n  •  •  •  n  in  =  0. 

(b)  The  map  tp  :  R  — »■  n7  =  1  R/Ij  given  by  <p(r)  =  (...,  r  +  is  an  onto 

ring  homomorphism,  its  kernel  is  P)J=1  7;,  and  the  homomorphism  descends  to  a 
ring  isomorphism 

R/  fl  Ij  =  R/h  x  •••  x  /?//„. 

j=  i 

(c)  The  intersection  H7=l  A  an°i  the  product  7i  •  •  •  7„  coincide. 

Proof.  For  existence  in  (a)  when  n  =  1,  we  take  x  =  x\.  For  existence 
when  n  =  2,  the  assumption  7|  +  72  =  R  implies  that  there  exist  ci\  £  1\  and 
a 2  £  h  with  a\  +  a2  =  T  Given  xj  and  x2,  we  put  x  =  x |  a 2  +  x2«i ,  and  then 
x  =  xia2  =  X]  mod  I\  and  x  =  x2oi  =  x2  mod  /2. 

For  general  n,  the  assumption  l\  +  7;  =  R  for  j  >  2  implies  that  there 
exist  cij  £  I\  and  bj  £  Ij  with  etj  +  bj  =  1.  If  we  expand  out  the  product 

I  =  n;=2  (ai  +  £/)>  then  all  terms  but  one  on  the  right  side  involve  some  cij 
and  are  therefore  in  I\.  That  one  term  is  b2/?2  •  •  •  bn,  and  it  is  in  H'Lq  Ij-  Thus 

I I  +  P|"=2  Ij  =  R.  The  case  n  =  2,  which  was  proved  above,  yields  an  element 
yi  in  R  such  that 

yi  =  1  mod  I\  and  yi  =  0  mod  H/^i  Ij- 

Repeating  this  process  for  index  i  and  using  the  assumption  /,  + 1  j  =  R  for  j  /  i, 
we  obtain  an  element  y ,■  in  R  such  that 

yi  =  1  mod  Ij  and  y,  =  0  mod  H; Ij- 

If  we  put  x  =  X|  yi  +  •  •  •  +  xnyn,  then  we  have  x  =  x/j,-  mod  /,  =  x,-  mod  7,  for 
each  i,  and  the  proof  of  existence  is  complete. 

For  uniqueness  in  (a),  if  we  have  two  elements  x  and  x'  satisfying  the  con¬ 
gruences,  then  their  difference  x  —  x1  lies  in  I  j  for  every  j ,  hence  is  0  under  the 
assumption  that  I\  PI  •  •  •  Fl  /„  =  0. 

In  (b),  the  map  (p  is  certainly  a  ring  homomorphism.  The  existence  result  in  (a) 
shows  that  <p  is  onto,  and  the  proof  of  the  uniqueness  result  identifies  the  kernel. 
The  isomorphism  follows. 

For  (c),  consider  the  special  case  that  7  and  J  are  ideals  with  7  +  J  =  R. 
Certainly  7/  C  I  D  J .  For  the  reverse  inclusion,  choose  x  6  7  and  y  £  J  with 
x  +  y  =  1;  this  is  possible  since  I  +  J  =  R.  If  z  is  in  7  Cl  J,  then  z  =  zx  +  zy 
with  zx  in  J I  and  zy  in  77.  Thus  z  is  exhibited  as  in  77. 

Consequently  I\ 72  =  I\  Cl  72.  Suppose  inductively  that  I\  ■  ■  ■  h  =  h  Cl  •  •  •  Cl  7*. 
We  saw  in  the  proof  of  (a)  that  7*+i  +  p];  X7+ ,  7;  =  R ,  and  thus  we  certainly  have 

h+i  +  flL  Ij  =  R-  The  special  case  in  the  previous  paragraph,  in  combination 
with  the  inductive  hypothesis,  shows  that  4  i  h  •  •  •  h  =  h+\  •  ( 1  Ij)  = 
R/ii  Ij-  This  completes  the  induction  and  the  proof.  □ 
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Corollary  8.28.  Let  R  be  a  principal  ideal  domain,  and  let  a  =  ep\'  ■  ■  ■  pkn 
be  a  factorization  of  a  nonzero  nonunit  element  a  into  the  product  of  a  unit  and 
powers  of  nonassociate  primes.  Then  there  is  a  ring  isomorphism 

R/(a)  =  R/(p\')x.--xR/(pkn"). 

Proof.  Let  Ij  =  ( //')  in  Theorem  8.27.  For  i  /  j,  we  have  GCD( pf ,  pkj)  = 

k'  k 

1.  Since  R  is  a  principal  ideal  domain,  there  exist  a  and  bin  R  with  apk+bpj -1  =  1, 

and  consequently  ( pk‘ )  +  ( p-  )  =  R.  The  theorem  applies,  and  the  corollary 
follows.  □ 

Corollary  8.29.  If  R  is  a  principal  ideal  domain,  then  any  finitely  generated 
unital  R  module  M  is  the  direct  sum  of  a  nonunique  free  R  submodule  0?=1  R  of 
a  well-defined  finite  rank  s  >  0  and  the  R  submodule  T  of  all  members  m  of  M 
such  that  nn  =  0  for  some  r  7^  0  in  R.  In  turn,  the  R  submodule  T  is  isomorphic 
to  a  direct  sum 

T  =  ®R/(pk/), 

j= 1 

k •  • 

where  the  pj  are  primes  in  R  and  the  ideals  ( p.' )  are  not  necessarily  distinct.  The 
number  of  summands  (pk)  for  each  class  of  associate  primes  p  and  each  positive 
integer  k  is  uniquely  determined  by  M. 

Remark.  As  mentioned  with  Theorem  8.25,  some  people  use  the  name 
“Fundamental  Theorem  of  Finitely  Generated  Modules”  to  refer  to  Corollary 
8.29  rather  than  to  Theorem  8.25. 

Proof.  Theorem  8.25c  gives  M  =  F  ©  0"=1  Rap  where  F  is  a  free  R 
submodule  of  some  finite  rank  5  and  the  a; ’s  are  nonzero  members  of  M  that  are 
each  annihilated  by  some  nonzero  member  of  R.  The  set  T  of  all  m  with  nn  =  0 
for  some  r  ^  0  in  R  is  exactly  0"=1  Rap  Then  F  is  R  isomorphic  to  M/T , 
hence  is  isomorphic  to  the  same  free  R  module  independently  of  what  direct-sum 
decomposition  of  M  is  used.  By  Theorem  8.25a,  .v  is  well  defined. 

The  cyclic  R  module  Raj  is  isomorphic  to  R/{bj),  where  (Ip )  is  the  ideal  of 
all  elements  r  in  R  with  raj  =  0.  The  ideal  ( bj )  is  nonzero  by  assumption  and 
is  not  all  of  R  since  the  element  r  =  1  has  lu;  =  at  7^  0.  Applying  Corollary 
8.28  for  each  j  and  adding  the  results,  we  obtain  T  =  0"=1  R / ( pk )  for  suitable 
primes  /?,  and  powers  kp  The  isomorphism  in  Corollary  8.28  is  given  as  a  ring 
isomorphism,  and  we  are  reinterpreting  it  as  an  R  isomorphism.  The  primes 
Pi  that  arise  for  fixed  (/?,)  are  distinct,  but  there  may  be  repetitions  in  the  pairs 
( pi ,  ki)  as  j  varies.  This  proves  existence  of  the  decomposition. 
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If  p  is  a  prime  in  R.  then  the  elements  m  of  T  such  that  pkm  =  0  for  some  k 
are  the  ones  corresponding  to  the  sum  of  the  terms  in  ®"=1  R/ipf )  in  which  pj 
is  an  associate  of  p.  Thus,  to  complete  the  proof,  it  is  enough  to  show  that  the  R 
isomorphism  class  of  the  R  module 

N  =  R/(ph)®---®  R/(plm) 

with  p  fixed  and  with  0  <  l\  <  •  •  •  <  lm  completely  determines  the  integers 
1 1  5  *  '  *  1  4*  • 

For  any  unital  R  module  L,  we  can  form  the  sequence  of  R  submodules 
pj  L.  The  element  p  carries  pJ  L  into  pJ  'r  1 L,  and  thus  each  p'  L/ p'+l  L  is  an 
R  module  on  which  p  acts  as  0.  Consequently  each  pj L/ p'+l  L  is  an  R/(p) 
module.  Corollary  8.16  and  Proposition  8.10  together  show  that  R/{p)  is  a  held, 
and  therefore  we  can  regard  each  pJ  L/ pJ"  1  L  as  an  R/(  p)  vector  space. 

We  shall  show  that  the  dimensions  dimRpp)(p' N / p'+] N)  of  these  vector 
spaces  determine  the  integers  l\, ...  ,lm.  We  start  from 

pJN  =  pjR/(ph)  ©  •  •  •  ®  pjR/(pIm). 

The  term  p> R/(p,k)  is  0  if  j  >  4-  Thus 

PJN  =  ©p'A/Cp4)  =  ©  PjR/puR. 

j <4  j<h 

Similarly 

Pj+lN  =  ®pj+lR/(p,k)  =  @pj+lR/p,kR. 

j <h  j <4 

Proposition  8.5  and  Theorem  8.3  give  us  the  R  isomorphisms 

pjN/pj+lN  =  ©  (p’ R/phR)/(p'+lR/phR)  =  ©  pjR/pj+1R, 

j<h  j<h 

and  these  must  descend  to  R/{p)  isomorphisms.  Consequently 

dmR/(p)(pJ N/pJ+l  N)  =  #{k  \  4  >  j}  dim R/{p)(pJ R/ pJ+l R). 

The  coset  p  '  +  p  ^  1  R  of  p  J  R /  p  J+ 1  R  has  the  property  that  multiplication  by  arbi¬ 
trary  elements  of  R  yields  all  of  p1  R / pJ+ 1  R.  Therefore  dim R/(P)(p'  R/ p'+i  R) 
=  1 ,  and  we  obtain 

dimR/{p)(pJN/pJ+1N)  =  #{k  \  4  >  j}. 

Thus  the  R  module  N  determines  the  integers  on  the  right  side,  and  these  deter¬ 
mine  the  number  of  4  ’s  equal  to  each  positive  integer  j.  This  proves  uniqueness. 

□ 
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Let  us  apply  Theorem  8.25  and  Corollary  8.29  to  the  principal  ideal  domain 
R  =  K[X],  where  K  is  a  field.  The  particular  unital  module  of  interest  is  a 
finite-dimensional  vector  space  V  over  K,  and  the  scalar  multiplication  by  K[X] 
is  given  by  A(X) v  =  A{L){ v)  for  each  polynomial  A(X),  where  L  is  a  fixed 
linear  map  L  :  V  — »■  V .  Let  us  see  that  the  results  of  this  section  recover  the 
structure  theory  of  L  as  developed  in  Chapter  V. 

Since  V  is  finite-dimensional  over  K,  V  is  certainly  finitely  generated  over 
R  =  K[X].  Theorem  8.25  gives 

V  =  R/(A i(X))  ®  •  •  •  ©  R/{An{X))  ©/?©•••©  R 

as  R  modules  and  in  particular  as  vector  spaces  over  K.  Each  summand  R  is 
infinite-dimensional  as  a  vector  space,  and  consequently  no  summand  R  can  be 
present.  Corollary  8.29  refines  the  decomposition  to  the  form 

V  =  R/(Px(X)ki)  ©  •  •  •  ©  R/(Pm(X)k") 

as  R  modules,  the  polynomials  Pj  ( X)  being  prime  but  not  necessarily  distinct. 
Since  the  R  isomorphism  is  in  particular  an  isomorphism  of  K  vector  spaces, 
each  R/(Pj(X)ki)  corresponds  to  a  vector  subspace  Vj,  and  V  =  V\  ©  •  •  •  ©  Vm. 
Since  the  R  isomorphism  respects  the  action  by  X,  we  have  L(Vy  C  V)  for  each 
j.  Thus  the  direct  sum  decompositions  of  Theorem  8.25  and  Corollary  8.29  are 
yielding  a  decomposition  of  V  into  a  direct  sum  of  vector  subspaces  invariant 
under  L.  Since  the  jth  summand  is  of  the  form  R /( Pj(X)kj ),  L  acts  on  Vj  in  a 
particular  way,  which  we  have  to  analyze. 

Let  us  carry  out  this  analysis  in  the  case  that  K  is  algebraically  closed  (as  for 
example  when  IK  =  C),  seeing  that  each  Vj  yields  a  Jordan  block  of  the  Jordan 
canonical  form  (Theorem  5.20a)  of  L.  For  the  case  of  general  K,  the  analysis 
can  be  seen  to  lead  to  the  corresponding  more  general  results  that  were  obtained 
in  Problems  32-40  at  the  end  of  Chapter  V. 

Since  K  is  algebraically  closed,  any  polynomial  in  K[  X  |  of  degree  >  1  has  a 
root  in  K  and  therefore  has  a  first-degree  factor  X  —  c.  Consequently  all  primes 
in  K[X]  are  of  the  form  X  —  c,  up  to  a  scalar  factor,  with  c  in  K.  To  understand 
the  action  of  L  on  Vj,  we  are  to  investigate  K[X]/((X  —  c)k). 

Suppose  that  A{X)  is  in  K[X]  and  is  of  degree  n  >  1.  Expanding  the 
monomials  of  A(X)  by  the  Binomial  Theorem  as 

x-i  =  (( x  -  c)  +  cy  =  eL 0  (/M  '(.y  -  cy, 
we  see  that  A(X)  has  an  expansion  as 


A(X)  —  gq  +  o,\(X  —  c)  H”  •  •  •  H”  o.n(X  —  c) 
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for  suitable  coefficients  Uq,  ,  an  in  K.  Let  the  invariant  subspace  that  we  are 
studying  be  V)0  C  V.  Since  V)0  is  isomorphic  as  an  R  module  toK[X]/((X  — c)A), 
( X  —  c)k  acts  on  Vj0  as  0.  So  does  every  higher  power  of  X  —  c,  and  hence 

A(X)  acts  as  no  +  ai(X  —  c)  +  •  •  •  +  ak~i(X  —  c)A_1. 

The  polynomials  on  the  right,  as  their  coefficients  vary,  represent  distinct 
cosets  of  K[X]/((X  —  c)k):  in  fact,  if  two  were  to  be  in  the  same  coset,  we  could 
subtract  and  see  that  (X  —  c)k  could  not  divide  the  difference  unless  it  were  0. 
The  distinct  cosets  match  in  one-one  K  linear  fashion  with  the  members  of  VjQ, 
and  thus  dim  Vj0  =  k.  Let  us  write  down  this  match.  Let  uo  be  the  member  of 
Vj0  that  is  to  correspond  to  the  coset  1  of  K[X]/(X  —  c)k .  On  V/0,  K[X]  is  acting 
with  Xu  =  L(v).  We  define  recursively  vectors  i>i, . . . ,  vk-\  of  Vj0  by 

V\  =  (L  —  cl)v o  =  (X  — c)uo  < — >  (X  — c)  •  1  =  X  — c, 

v2  =  (L  -  cl)v  1  =  (X-c)ui  (X-c)  •  (X-c)  =  (X-c)2, 


vjfc-t  =  (L-  cI)Vk-2  =  (X  —  c)Vk-2  ►  (X-C)  •  (X  —  c)k~2  =  (X-C)k~\ 

(L  -  cl)vk-i  =  (X-c)vk-i  (X-c)  •  (X—c)k~]  =  (X—c)k  =  0. 


We  conclude  from  this  correspondence  that  the  vectors  uo,  Wj, . . . ,  vk-\  form  a 
basis  of  Vj0  and  that  the  matrix  of  L  —  cl  in  the  ordered  basis  vk-i,  ■ . . ,  V\ ,  Vq  is 

/o  l  o  o  ■■■  o  o\ 

|  oio  -ool 
0  1  ■■■oo 


V 


0 


1  0 
0  1 
0 


/ 


Hence  the  matrix  of  L  in  the  same  ordered  basis  is 


c  1  0  0 
c  1  0 
c  1 


0  0\ 
00  I 
0  0 


c  1  0 


i.e.,  is  a  Jordan  block.  Thus  Theorem  8.25  and  Corollary  8.29  indeed  establish 
the  existence  of  Jordan  canonical  form  (Theorem  5.20a)  when  IK  is  algebraically 
closed.  It  is  easy  to  check  that  Corollary  8.29  establishes  also  the  uniqueness 
statement  in  Theorem  5.20a. 
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7.  Orientation  for  Algebraic  Number  Theory  and  Algebraic  Geometry 

The  remainder  of  the  chapter  introduces  material  on  commutative  rings  with  iden¬ 
tity  that  is  foundational  for  both  algebraic  number  theory  and  algebraic  geometry. 
Historically  algebraic  number  theory  grew  out  of  Diophantine  equations,  particu¬ 
larly  from  two  problems  — from  Fermat’s  Last  Theorem  and  from  representation 
of  integers  by  binary  quadratic  forms.  Algebraic  geometry  grew  out  of  studying 
the  geometry  of  solutions  of  equations  and  out  of  studying  Riemann  surfaces. 
Algebraic  geometry  and  algebraic  number  theory  are  treated  in  more  detail  in 
Advanced  Algebra. 

These  two  subjects  can  be  studied  on  their  own,  but  they  also  have  a  great 
deal  in  common.  The  discovery  that  the  plane  could  be  coordinatized  and  that 
geometry  could  be  approached  through  algebra  was  one  of  the  great  advances  of 
all  time  for  mathematics.  Since  then,  fundamental  connections  between  algebraic 
number  theory  and  algebraic  geometry  have  been  discovered  at  a  deeper  level, 
and  the  distinction  between  the  two  subjects  is  more  and  more  just  a  question  of 
one’s  point  of  view.  The  emphasis  in  the  remainder  of  this  chapter  will  be  on 
one  aspect  of  this  relationship,  the  theory  that  emerged  from  trying  to  salvage 
something  in  the  way  of  unique  factorization. 

By  way  of  illustration,  let  us  examine  an  analogy  between  what  happens  with 
a  certain  ring  of  “algebraic  integers”  and  what  happens  with  a  certain  “algebraic 
curve.”  The  ring  of  algebraic  integers  in  question  was  introduced  already  in 
Section  4.  It  is  R  =  Z[V— 5]  =  Z  +  Z\/— 5.  The  units  are  ±  1 .  Our  investigation 
of  unique  factorization  was  aided  by  the  function 

N(a  +  b\/— 5)  =  (a  +  bV—5)(a  —  h\/— 5)  =  a2  +  5b 2, 
which  has  the  property  that 

N((a  +  bV^Kc  +  dV^5))  =  N(a  +  byf^5)N{c  +  dV^5). 

With  this  function  we  could  determine  candidates  for  factors  of  particular  ele¬ 
ments.  In  connection  with  the  equality  2-3  =  (1  +  s/— 5  )(1  —  \J— 5 ),  we 
saw  that  the  two  factors  on  the  left  side  and  the  two  factors  on  the  right  side  are 
all  irreducible.  Moreover,  neither  factor  on  the  left  is  the  product  of  a  unit  and 
a  factor  on  the  right.  Therefore  R  is  not  a  unique  factorization  domain.  As  a 
consequence  it  cannot  be  a  principal  ideal  domain.  In  fact,  (2,  1  +  V— 5 )  is  an 
example  of  an  ideal  that  is  not  principal.  We  shall  return  shortly  to  examine  this 
ring  further. 

Now  we  introduce  the  algebraic  curve.  Consider  y2  =  (x  —  l)x(x  +  1)  as 
an  equation  in  two  variables  x  and  v.  To  fix  the  ideas,  we  think  of  a  solution  as 
a  pair  (x,  y)  of  complex  numbers.  Although  the  variables  in  this  discussion  are 
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complex,  it  is  convenient  to  be  able  to  draw  pictures  of  the  solutions,  and  one  does 
this  by  showing  only  the  solutions  (x,  y)  with  x  and  y  in  R.  Figure  8.6  indicates 
the  set  of  solutions  in  R2  for  this  particular  curve.  We  can  study  these  solutions 
for  a  while,  looking  for  those  pairs  (x,  y)  with  x  and  y  rationals  or  integers,  but 
a  different  level  of  understanding  comes  from  studying  functions  on  the  locus  of 
complex  solutions.  The  functions  of  interest  are  polynomial  functions  in  the  pair 
(x,  y),  and  we  identify  two  of  them  if  they  agree  on  the  locus.  Thus  we  introduce 
the  ring 

R'  =  C[x,  y]/(y2  -  (x  -  l)x(x  +  1)). 

There  is  a  bit  of  a  question  whether  this  is  indeed  the  space  of  restrictions,  but 
that  question  is  settled  affirmatively  by  the  “Nullstellensatz”  in  Section  VII.  1  of 
Advanced  Algebra  and  a  verification  that  the  principal  ideal  (y2  —  (x  —  l)x(x  + 1)) 
is  prime.8  The  ring  R'  is  called  the  ‘‘affine  coordinate  ring”  of  the  curve,  and  the 
curve  itself  is  an  example  of  an  “affine  algebraic  curve.” 


Figure  8.6.  Real  points  of  the  curve  y2  =  (x  —  l)x(x  +  1). 

We  can  recover  the  locus  of  the  curve  from  the  ring  R '  as  follows.  If  (xo,  >’o)  is  a 
point  of  the  curve,  then  it  is  meaningful  to  evaluate  members  of  R'  at  (xo,  yo),  and 
we  let  I(Xo,y 0)  be  the  ideal  of  all  members  of  R'  vanishing  at  (xo,  yo).  Evaluation 
at  (xo,  yo)  exhibits  the  ring  R' /  f(x0,y0)  as  isomorphic  to  C,  which  is  a  held.  Thus 
I(x0,yo)  is  a  maximal  ideal  and  is  in  particular  prime.  It  turns  out  for  this  example 
that  all  nonzero  prime  ideals  are  of  this  form.9  We  return  to  make  use  of  this 
geometric  interpretation  of  prime  ideals  in  a  moment. 


8The  polynomial  y2  —  (x  —  \)x(x  +  1)  is  prime  since  (x  —  l)x(x  +  1)  is  not  a  square,  or  since 
Eisenstein’s  criterion  applies.  The  principal  ideal  ( y 2  —  (x  —  l)x(x  +  1))  is  therefore  prime  by 
Proposition  8.14.  What  the  Nullstellensatz  says  when  the  underlying  field  is  algebraically  closed  is 
that  the  only  polynomials  vanishing  on  the  zero  locus  of  a  prime  ideal  are  the  members  of  the  ideal. 

9In  Section  9,  Example  3  of  integral  closures  in  combination  with  Proposition  8.45  shows  that 
every  nonzero  prime  ideal  of  R'  is  maximal.  (In  algebraic  geometry  one  finds  that  this  property  of 
prime  ideals  is  a  reflection  of  the  1-dimensional  nature  of  the  curve.)  The  Nullstellensatz  says  that 
the  maximal  ideals  are  all  of  the  form  I(x0.y0 )• 
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Now  let  us  consider  factorization  in  R' .  Every  element  of  R'  can  be  written 
uniquely  as  A(x)  +  B(x)y,  where  A(x)  and  B (x )  are  polynomials.  The  analog 
in  R'  of  the  quantity  N (a  +  b\[— 5 )  in  the  ring  R  is  the  quantity 

N(A(x)  +  B(x)y)  =  (AGO  +  B(x)y)(A(x)  -  B(x)y) 

=  A(x)2  -  B(xfy2 
=  A(x)2  —  B(x)2(x3  —  x). 


Easy  computation  shows  that 

N({A{x)  +  B(x)y)(C(x)  +  D(x)y))  =  N(A(x)  +  B{x)y)N(C{x)  +  D(x)y), 

and  hence  N(  • )  gives  us  a  device  to  use  to  check  whether  elements  of  R'  are 
irreducible.  We  find  in  the  equation 

(x  +  y)(x  —  y)  =  x2  -  (x3  —  x)  =  —  x(x  —  j(l  +  V5))(x  —  j(l  —  V5)) 

that  the  two  elements  on  the  left  side  and  the  three  elements  on  the  right  side  are 
irreducible.  Therefore  unique  factorization  fails  in  R' . 

Although  unique  factorization  fails  for  the  elements  of  R' ,  there  is  a  notion 
of  factorization  for  ideals  in  R'  that  behaves  well  algebraically  and  has  a  nice 
geometric  interpretation.  Recall  that  the  nonzero  prime  ideals  correspond  to  the 
points  of  the  locus  y2  =  (x  —  1  )x(x  +1)  via  passage  to  the  zero  locus,  the  ideal 
corresponding  to  (xo,  Vo)  being  called  /(VOiVo).  For  any  two  ideals  /  and  J ,  we 
can  form  the  product  ideal  I J  whose  elements  are  the  sums  of  products  of  a 
member  of  I  and  a  member  of  J .  Then  lfXg  v  )  may  be  interpreted  as  the  ideal  of 

all  members  of  R'  vanishing  at  (xq  ,  yo)  to  order  k  or  higher,  and  }  •  •  •  /*"  } 

becomes  the  ideal  of  all  members  of  R'  vanishing  at  each  (xj,  yj )  to  order  at 
least  kj.  We  shall  see  in  Section  11  that  every  nonzero  proper  ideal  /  in  R' 
factors  in  this  way.  The  points  (xj,  yj )  and  the  integers  kj  have  a  geometric 
interpretation  in  terms  of  /  and  are  therefore  uniquely  determined:  the  (xj,  _y;)’s 
form  the  locus  of  common  zeros  of  the  members  of  /,  and  the  integer  kj  is  the 
greatest  integer  such  that  the  vanishing  at  (xj ,  yj )  is  always  at  least  to  order  kj .  In 
a  sense,  factorization  of  elements  was  the  wrong  thing  to  consider;  the  right  thing 
to  consider  is  factorization  of  ideals,  which  is  unique  because  of  the  associated 
geometric  interpretation. 

Returning  to  the  ring  R  =  ] ,  we  can  ask  whether  factorization  of  ideals 

is  a  useful  notion  in  R.  Again  1 J  is  to  be  the  set  of  all  sums  of  products  of  an 
element  in  /  and  an  element  in  J .  For  /  =  ( 2,1  +  V— 5 )  and  J  =  (2,  1  —  V— 5 ), 
we  get  all  sums  of  expressions  (2 a  +  b(  1  +  s/—5 ))  (2 c  +  d(  1  —  V— 5 ))  in  which 
a,b,c,d  are  in  Z,  hence  all  sums  of  expressions 

2(2ac  +  3  bd)  +  2  (be  +  ad)  +  2-/— 5  (be  —  ad). 
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All  such  elements  are  divisible  by  2.  Two  examples  come  by  taking  a  =  c  =  1 
and  b  =  d  =  0  and  by  taking  a  =  c  =  0  and  b  =  d  =  1;  these  give  4  and  6. 
Subtracting,  we  see  that  2  is  a  sum  of  products.  Thus  I J  =  (2).  The  element  2 
is  irreducible  and  not  prime,  and  we  know  from  Proposition  8.14  that  the  ideal 
(2)  therefore  cannot  be  prime.  What  we  find  is  that  the  ideal  (2)  factors  even 
though  the  element  2  does  not  factor.  It  turns  out  that  f?  has  unique  factorization 
of  ideals,  just  the  way  R'  does. 

The  prime  ideals  of  the  ring  R  have  a  certain  amount  of  structure  in  terms  of 
the  primes  or  prime  ideals  of  Z.  To  understand  what  to  expect,  let  us  digress  for 
a  moment  to  discuss  what  happens  with  the  ring  R"  =  Z [;']  =  z  +  z<y^i  of 
Gaussian  integers.  This  too  was  introduced  in  Section  4,  and  it  is  a  Euclidean  do¬ 
main,  hence  a  principal  ideal  domain.  It  has  unique  factorization.  Its  appropriate 
N{  ■ )  function  is  N  (a  +  ib)  =  a2  +  b2.  Problems  27-31  at  the  end  of  the  chapter 
ask  one  to  verify  that  the  primes  of  R" ,  up  to  multiplication  by  one  of  the  units 
±1  and  =h',  are  members  of  R"  of  any  of  the  three  kinds 
p  =  An  +  3  that  is  prime  in  Z  and  has  n  >  0, 
p  =  a  ±  ib  with  a2  +  b2  prime  in  Z  of  the  form  An  +  1  with  n  >  0, 
p  =  1  ±  i  (these  are  associates). 

These  three  kinds  may  be  distinguished  by  what  happens  to  the  function  N(-). 
In  the  first  case  N(p)  =  p2  is  the  square  of  a  prime  of  Z  and  is  the  square  of 
a  prime  of  R",  in  the  second  case  N(p)  is  a  prime  of  Z  that  is  the  product  of 
two  distinct  primes  of  R",  and  in  the  third  case  /V ( p)  is  a  prime  of  Z  that  is  the 
square  of  a  prime  of  R",  apart  from  a  unit  factor.  The  nonzero  prime  ideals  of  R" 
are  the  principal  ideals  generated  by  the  prime  elements  of  R" ,  and  they  fall  into 
three  types  as  well.  Each  nonzero  prime  ideal  P  has  a  prime  p  of  Z  attached  to 
it,  namely  the  one  with  ip)  =  ZD  P,  and  the  type  of  the  ideal  corresponds  to  the 
nature  of  the  factorization  of  the  ideal  pR"  of  R  " .  Specifically  in  the  first  case 
pR"  is  a  prime  ideal  in  R",  in  the  second  case  pR"  is  the  product  of  two  distinct 
prime  ideals  in  R",  and  in  the  third  case  pR"  is  the  square  of  a  prime  ideal  in  R  " . 

The  structure  of  the  prime  ideals  in  R  is  of  the  same  nature  as  with  R" . 
Each  nonzero  prime  ideal  P  has  a  prime  p  of  Z  attached  to  it,  again  given 
by  {p)  =  Z  fl  P,  and  the  three  kinds  correspond  to  the  factorization  of  the  ideal 
pR  of  R.  Let  us  be  content  to  give  examples  of  the  three  possible  behaviors: 

Ilf?  is  prime  inf?, 

2 f?  is  the  product  of  two  distinct  prime  ideals  in  f?, 

5  f?  is  the  square  of  the  prime  ideal  (V —5 )  in  R. 

We  have  already  seen  the  decomposition  of  2 f?,  and  the  decomposition  of  5 f?  is 
easy  to  check.  With  Ilf?,  the  idea  is  to  show  that  11  is  a  prime  element  in  R. 
Thus  let  11  divide  a  product  in  R.  Then  /V ( I  I )  =  ll2  divides  the  product  of 
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the  11  divides  the  product  of  the  /V  ( •  )’s,  and  1 1  must  divide  one  of  the 

N(  •  )’s.  Say  that  11  divides  N(a  +  b\[^. 5 ),  i.e.,  that  a2  +  5 b2  =  0  mod  11.  If 
1 1  divides  one  of  a  or  /;,  then  this  congruence  shows  that  1 1  divides  the  other  of 
them;  then  11  divides  a  +  byj— 5,  as  we  wanted  to  show.  The  other  possibility 
is  that  1 1  divides  neither  a  nor  b.  Then  (ab~1)2  =  —5  mod  1 1  says  that  —5  is  a 
square  modulo  11,  and  we  readily  check  that  it  is  not.  The  conclusion  is  that  1 1 
is  indeed  prime  in  R. 

This  structure  for  the  prime  ideals  of  R  has  an  analog  with  the  curve  and  its 
ring  R' .  The  analogs  for  the  curve  case  of  Z  and  y/—5  for  the  number-theoretic 
case  are  C[x]  and  y.  The  primes  of  C[x]  are  nonzero  scalars  times  polynomials 
x  —  c  with  c  complex,  and  the  relevant  question  for  R'  is  how  the  ideal  (x  —  c)R’ 
decomposes  into  prime  ideals.  We  can  think  about  this  problem  algebraically  or 
geometrically.  Algebraically,  the  ideal  of  all  polynomials  vanishing  at  (xo,  >’o)  is 
I{x0,yo)  =  (*  --*0,  y  —  yo),  the  set  of  all  (x  -x0)  A(x) -y0B(x)  +  yB(x)  with  A{x) 
and  B(x)  in  C[x],  The  intersection  with  C[x]  consists  of  all  (x  —  xo )  A  (x )  and  is 
therefore  the  principal  ideal  (x  —  Xo).  We  want  to  factor  the  ideal  (x  —  xq)R' . 

If  we  pause  for  a  moment  and  think  about  the  problem  geometrically,  the  answer 
is  fairly  clear.  Ideals  correspond  to  zero  loci  with  multiplicities.  The  question 
is  the  factorization  of  the  ideal  of  all  polynomials  vanishing  when  x  =  Xq.  For 
most  values  of  the  complex  number  xo,  there  are  two  choices  of  the  complex  y 
such  that  (xo,  y)  is  on  the  locus  since  y  is  given  by  a  quadratic  equation,  namely 
y2  =  (xo  —  l)xo(xo  +  1).  Thus  for  most  values  of  Xo,  (x  —  Xo )R'  is  the  product 
of  two  distinct  prime  ideals.  The  geometry  thus  suggests  that 

(x  -  Xo )R'  =  (x  -  Xo,  y  -  yo)(x  -  x0,  y  +  yo), 

where  y(2  =  (xo  —  l)xo(xo  +  1)  and  it  is  assumed  that  yo  7^  0.  We  can  verify  this 
algebraically:  The  members  of  the  product  ideal  are  the  polynomials 

((x  -  xo)A(x)  +  (y  -  y0)B(x))((x  -  x0)C(x)  +  (y  +  y0)D(x)) 

=  (x  -  x0)2A(x)C(x)  +  (x  -  x0)(A(x)(y  +  y0)D(x))  +  C(x)(y  -  y0)B(x)) 
+  (y2  -  y02)B(x)D(x). 

The  last  term  on  the  right  side  is  ((x3  —  x)  —  (Xq  —  Xo ))B(x)D(x)  and  is  divisible 
by  x  —  Xo.  Therefore  every  member  of  the  product  ideal  lies  in  the  principal 
ideal  (x  —  xo).  On  the  other  hand,  the  product  ideal  contains  (x  —  xo)(x  —  xo) 
and  also  (y2  —  y(2)  =  (x3  —  Xq)  —  (x  —  xo)  =  (x  —  Xo)(x2  +  xxo  +  Xq).  Since 
GCD((x— xo),  (x2TxxoTXq))  =  1,  the  product  ideal  contains  x  —  xo-  Therefore 
the  product  ideal  equals  (x  —  xo). 

The  exceptional  values  of  xo  are  — 1,0,  +1,  where  the  locus  has  yo  =  0. 
The  geometry  of  the  factorization  is  not  so  clear  in  this  case,  but  the  algebraic 
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computation  remains  valid.  Thus  we  have  (x  —  x$)R'  =  (x  —  xo,  y)2  if  Xo  equals 
—  1 ,  0,  or  + 1 .  The  conclusion  is  that  the  nonzero  prime  ideals  of  R'  are  of  two 
types,  with  (x  —  Xo )R'  equal  to 

the  product  of  two  distinct  prime  ideals  in  R'  if  xo  is  not  in  {  —  1, 0,  +1}, 
the  square  of  a  prime  ideal  in  R'  if  xo  is  in  {  —  1,  0,  +1}. 

The  third  type,  with  (x  —  xo )R'  prime  in  R',  does  not  arise.  Toward  the  end  of 
Chapter  IX  we  shall  see  how  we  could  have  anticipated  the  absence  of  the  third 
type. 

That  is  enough  of  a  comparison  for  now.  Certain  structural  results  useful  in 
both  algebraic  number  theory  and  algebraic  geometry  are  needed  even  before 
we  get  started  at  factoring  ideals,  and  those  are  some  of  the  topics  for  the 
remainder  of  this  chapter.  In  Section  11  we  conclude  by  establishing  unique 
factorization  of  ideals  for  a  class  of  examples  that  includes  the  examples  above. 
In  the  examples  above,  the  rings  we  considered  were  Z[X]/(X2  +  5)  =  5  ] 

and  C[x,  y]/(y2  —  (x  —  l)x(x  +  1))  =  C[x][V(x  —  l)x(x  +  1)  ].  In  each  case 
the  notation  [  •  ]  refers  to  forming  the  ring  generated  by  the  coefficients  and  the 
expression  or  expressions  in  brackets. 

First  we  establish  a  result  saying  that  ideals  in  the  rings  of  interest  are  not 
too  wild.  For  example,  in  algebraic  geometry,  one  wants  to  consider  the  set  of 
restrictions  of  the  members  of  K[Xi,  . . . ,  X„],  K  being  a  field,  to  the  locus  of 
common  zeros  of  a  set  of  polynomials.  The  general  tool  will  tell  us  that  any  ideal 
in  K[Xi, . . .  ,Xn\  is  finitely  generated;  thus  a  description  of  what  polynomials 
vanish  on  the  locus  under  study  is  not  completely  out  of  the  question.  The  tool  is 
the  Hilbert  Basis  Theorem  and  is  the  main  result  of  Section  8. 

Second  we  need  a  way  of  understanding,  in  a  more  general  setting,  the  relation¬ 
ship  that  we  used  in  the  above  examples  between  Z  and  Z[>/^5  ],  and  between 
C[x]  and  C[x][V(x  —  l)x(x  +  1)  ].  The  tool  is  the  notion  of  integral  closure  and 
is  the  subject  of  Section  9. 

Third  we  need  a  way  of  isolating  the  behavior  of  prime  ideals,  of  eliminating 
the  influence  of  algebraic  or  geometric  factors  that  have  nothing  to  do  with  the 
prime  ideal  under  study.  The  tool  is  the  notion  of  localization  and  is  the  subject 
of  Section  10. 

In  Section  1 1  we  make  use  of  these  three  tools  to  establish  unique  factorization 
of  ideals  for  a  class  of  integral  domains  known  as  “Dedekind  domains.”  It  is  easy 
to  see  that  principal  ideal  domains  are  Dedekind  domains,  and  we  shall  show 
that  many  other  integral  domains,  including  the  examples  above,  are  Dedekind 
domains.  A  refined  theorem  producing  Dedekind  domains  will  be  obtained 
toward  the  end  of  Chapter  IX  once  we  have  introduced  the  notion  of  a  “separable” 
extension  of  fields. 
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8.  Noetherian  Rings  and  the  Hilbert  Basis  Theorem 

In  this  section,  R  will  be  a  commutative  ring  with  identity,  and  all  R  modules 
will  be  assumed  unital.  We  begin  by  introducing  three  equivalent  conditions  on 
a  unital  R  module. 

Proposition  8.30.  If  R  is  a  commutative  ring  with  identity  and  M  is  a  unital 
R  module,  then  the  following  conditions  on  R  submodules  of  M  are  equivalent: 

(a)  (ascending  chain  condition)  every  strictly  ascending  chain  of  R  sub- 
modules  M\  C  Ms  5  •  •  •  terminates  in  finitely  many  steps, 

(b)  (maximum  condition)  every  nonempty  collection  of  R  submodules  has 
a  maximal  element  under  inclusion, 

(c)  (finite  basis  condition)  every  R  submodule  is  finitely  generated. 

Proof.  To  see  that  (a)  implies  (b),  let  C  be  a  nonempty  collection  of  R 
submodules  of  M.  Take  M\  in  C.  If  M\  is  not  maximal,  choose  Ms  in  C  properly 
containing  M\.  If  Ms  is  not  maximal,  choose  Ms,  in  C  properly  containing  Ms. 
Continue  in  this  way.  By  (a),  this  process  must  terminate,  and  then  we  have  found 
a  maximal  R  submodule  in  C. 

To  see  that  (b)  implies  (c),  let  A'  be  an  R  submodule  of  M ,  and  let  C  be 
the  collection  of  all  finitely  generated  R  submodules  of  N .  This  collection  is 
nonempty  since  0  is  in  it.  By  (b),  C  has  a  maximal  element,  say  N'.  If  x  is  in 
N  but  x  is  not  in  N',  then  N'  +  Rx  is  a  finitely  generated  R  submodule  of  N 
that  properly  contains  N'  and  therefore  gives  a  contradiction.  We  conclude  that 
N'  =  N,  and  therefore  N  is  finitely  generated. 

To  see  that  (c)  implies  (a),  let  M\  C  Ms  ^  •  •  •  be  given,  and  put  N  = 
U~i  By  (c),  N  is  finitely  generated.  Since  the  Mn  are  increasing  with  n, 
we  can  find  some  M„0  containing  all  the  generators.  Then  the  sequence  stops  no 
later  than  at  M„0.  □ 

Let  us  apply  Proposition  8.30  with  M  taken  to  be  the  unital  R  module  R.  As 
always,  the  R  submodules  of  R  are  the  ideals  of  R. 

Corollary  8.31.  If  R  is  a  commutative  ring  with  identity,  then  the  following 
conditions  on  R  are  equivalent: 

(a)  ascending  chain  condition  for  ideals:  every  strictly  ascending  chain  of 
ideals  in  R  is  finite, 

(b)  maximum  condition  for  ideals  of  R :  every  nonempty  collection  of  ideals 
in  R  has  a  maximal  element  under  inclusion, 

(c)  finite  basis  condition  for  ideals:  every  ideal  in  R  is  finitely  generated. 
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The  corollary  follows  immediately  from  Proposition  8.30.  A  commutative 
ring  with  identity  satisfying  the  equivalent  conditions  of  Corollary  8.31  is  said  to 
be  a  Noetherian  commutative  ring. 

Examples. 

(1)  Principal  ideal  domains,  such  as  Z  and  K[X]  when  K  is  a  field.  The  finite 
basis  condition  for  ideals  is  satisfied  since  every  ideal  is  singly  generated.  The  fact 
that  (c)  implies  (a)  has  already  been  proved  manually  for  principal  ideal  domains 
twice  in  this  chapter— once  in  the  proof  of  (UFD1)  for  a  principal  ideal  domain 
in  Theorem  8.15  and  once  in  the  proof  of  Lemma  8.26. 

(2)  Any  homomorphic  image  R'  of  a  Noetherian  commutative  ring  R,  provided 
1  maps  to  1.  In  fact,  if  /'  C  R'  is  an  ideal,  its  inverse  image  I  is  an  ideal  in  R\ 
the  image  of  a  finite  set  of  generators  of  /  is  a  finite  set  of  generators  of  /'. 

(3)  K[Xi , . . . ,  Xn  ]  when  K  is  a  field.  This  commutative  ring  is  Noetherian  by 
application  of  the  Hilbert  Basis  Theorem  (Theorem  8.32  below)  and  induction  on 
n.  This  ring  is  also  a  unique  factorization  domain,  as  we  saw  in  Section  5. 

(4)  Z[X],  This  commutative  ring  is  Noetherian,  also  by  the  Hilbert  Basis 
Theorem  below.  Example  2  shows  therefore  that  the  quotient  Z[V —5  ]  = 
Z[X)/{X2  +  5)  is  Noetherian.  This  ring  is  an  integral  domain,  and  we  have 
seen  that  it  is  not  a  unique  factorization  domain. 

Theorem  8.32  (Hilbert  Basis  Theorem).  If  R  is  a  nonzero  Noetherian  com¬ 
mutative  ring,  then  so  is  7?  [A], 

PROOF.  If  7  is  an  ideal  in  /?[  X]  and  if  k  >  0  is  an  integer,  let  Ltd)  be  the 
union  of  {0}  and  the  set  of  all  nonzero  elements  of  R  that  appear  as  the  coefficient 
of  Xk  in  some  element  of  degree  k  in  7.  First  let  us  see  that  {Ltd) }t>o  is  an 
increasing  sequence  of  ideals  in  R.  In  fact,  if  A(X)  and  B(X)  are  polynomials  of 
degree  k  in  I  with  leading  terms  akXk  and  bk  Xk ,  then  A(X)  +  B(X)  has  degree 
k  if  hk  i=-  —aic,  and  hence  a*  +  bt  is  in  Ltd )  in  every  case.  Similarly  if  r  is  in  R 
and  rat  i=-  0,  then  r  A(X)  has  degree  k ,  and  hence  rat  is  in  Ltd)  in  every  case. 
Consequently  Ltd)  is  an  ideal  in  R.  Since  I  is  closed  under  multiplication  by 
X,  Lkd)  Q  Lk+id)  for  all  k  >  0. 

Next  let  us  prove  that  if  J  is  any  ideal  in  7?[A]  such  that  /  C  j  and  Ltd )  = 
Lt(J)  for  all  k  >  0,  then  I  =  J .  Let  B(X)  be  in  J  with  deg  B(X)  =  k.  Arguing 
by  contradiction,  we  may  suppose  that  B(X)  is  not  in  7  and  that  k  is  the  smallest 
possible  degree  of  a  polynomial  in  J  but  not  in  I .  Since  Lkd)  =  Lk(J),  we 
can  find  A(X)  in  I  whose  leading  term  is  the  same  as  the  leading  term  of  B(X). 
Since  B(X)  is  not  in  7,  B{X)  —  A{X)  is  not  in  7.  Since  /c],  B(X)  —  A(X)  is 
in  J.  Since  deg (B(X)  —  A(X))  <  k  —  1,  we  have  arrived  at  a  contradiction  to 
the  defining  property  of  k.  We  conclude  that  I  =  J . 
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Now  let  {Ij}j> o  be  an  ascending  chain  of  ideals  in  7?[X],  and  form  7,  (/,)  for 
each  i .  When  i  or  j  is  fixed,  these  ideals  are  increasing  as  a  function  of  the  other 
index,  j  or  i.  By  the  maximum  condition  in  R,  7,(7,)  c  Lp(lq)  for  some  p 
and  q  and  all  i  and  j.  For  i  >  p  and  j  >  q,  we  have  L,  ( Ij)  2  Lp(Iq)  and 
thus  Li(Ij)  =  Lp(Iq).  The  case  j  =  q  gives  Lp(Iq)  =  Lj(Iq),  and  therefore 
7/(7,)  =  Li(Iq)  for  i  >  p  and  j  >  q-  For  any  fixed  i ,  the  ascending  chain 
condition  on  ideals  gives  7,  (7,)  =  Lj  (/„(,))  for  j  >  «(/),  and  the  above  argument 
shows  that  we  may  take  n(i)  =  q  if  i  >  p.  Hence  n(i )  may  he  taken  to  he  bounded 
in  i,  say  by  iiq,  and  7,  (7,)  =  7,(7„0)  for  all  i  >  0  and  j  >  no.  By  the  result 
of  the  previous  paragraph,  /,  =  7„0  for  j  >  no,  and  hence  the  ascending  chain 
condition  has  been  verified  for  ideals  in  /?[X].  □ 

Proposition  8.33.  In  a  Noetherian  integral  domain  R,  every  nonzero  nonunit 
is  a  product  of  irreducible  elements. 

Remark.  The  proof  below  gives  an  alternative  argument  for  (UFDl)  in 
Theorem  8.15,  an  argument  that  does  not  so  explicitly  use  the  full  force  of  Zorn’s 
Lemma. 

PROOF.  Let  A]  be  a  nonzero  nonunit  of  R.  If  ci\  is  not  irreducible,  then  ti\ 
has  a  factorization  a\  =  ciibi  in  which  neither  <32  nor  hi  is  a  unit.  If  a 2  is  not 
irreducible,  then  as  has  a  factorization  a 2  =  a^bi  in  which  neither  <33  nor  b\  is 
a  unit.  We  continue  in  this  way  as  long  as  it  is  possible  to  do  so.  Let  us  see 
that  this  process  cannot  continue  indefinitely.  Assume  the  contrary.  The  equality 
a\  =  a2^2  with  bs  not  a  unit  says  that  the  inclusion  of  ideals  (a\)  c  (af ,  a2)  is 
proper.  Arguing  in  this  way  with  «2>  «3,  and  so  on,  we  obtain 

(« 1)  §  (01,02)  §  (Oi.a2.03)  g  ...  , 

in  contradiction  to  the  ascending  chain  condition  for  ideals.  Because  of  this 
contradiction  we  conclude  that  for  some  n,  an  does  not  have  any  decomposition 
o„  =  an+ 1  bn+ 1  with  bn+\  a  nonunit.  Hence  o„  is  irreducible.  The  upshot  is  that 
our  original  element  01  has  an  irreducible  factor,  say  c  1. 

Write  a  1  =  c\ds.  If  ds  is  not  a  unit,  repeat  the  process  with  it,  obtaining 
d2  =  C2d2  with  C2  irreducible.  If  <3/3  is  not  a  unit,  we  can  again  repeat  this  process. 
This  process  cannot  continue  indefinitely  because  otherwise  we  would  have  a 
strictly  increasing  sequence  of  ideals 

(ci)  C  (c1?  c2)  ^  (ci,  C2,  c3)  C  .  .  .  , 

in  contradiction  to  the  ascending  chain  condition  for  ideals.  Thus  for  some  n,  we 
have  O]  =  C1C2  . . .  cndn+\  with  c\, . . .  ,cn  irreducible  and  with  dn+\  equal  to  a 
unit.  Grouping  cn  and  dn+ 1  as  a  single  irreducible  factor,  we  obtain  the  desired 
factorization  of  the  given  element  a\ .  □ 
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Proposition  8.34.  If  R  is  a  Noetherian  commutative  ring,  then  any  R  submod¬ 
ule  of  a  finitely  generated  unital  R  module  is  finitely  generated. 

Remark.  The  proof  follows  the  lines  of  the  argument  for  Proposition  8.24. 

PROOF.  Let  M  be  a  unital  finitely  generated  R  module  with  a  set  [m  \ , . . . ,  m„ } 
of  n  generators,  and  define  Mk  =  Rm\  +  -  ■  -  +  Rnii  for  I  <  k  <  n.  Then  M„  =  M 
since  M  is  unital.  We  shall  prove  by  induction  on  k  that  every  R  submodule  of 
Mk  is  finitely  generated.  The  case  k  =  n  then  gives  the  proposition.  For  k  =  1 , 
suppose  that  S  is  an  R  submodule  of  M\  =  Rm  i .  Let  I  be  the  subset  of  all  r 
in  R  with  rm  \  in  S.  Since  S  is  an  R  submodule,  I  is  an  ideal  in  R,  necessarily 
finitely  generated  since  R  is  Noetherian.  Let  I  =  (rj, . . . ,  r/).  Then  S  =  Im\  = 
Rr\m\  +  Rrjm  \  +  •  •  •  +  Rr/m  \ ,  and  the  elements  r\m\,  ^wi, . . . ,  r\m\  form  a 
finite  set  of  generators  of  S. 

Assume  inductively  that  every  R  submodule  of  M/{  is  known  to  be  finitely 
generated,  and  let  Nk+i  be  an  R  submodule  of  M):+\ .  Let  q  :  Mk+\  — »■  Mk+i/Mk 
be  the  quotient  R  homomorphism,  and  let  tp  be  the  restriction  q  |  ,  mapping 

Nk+t  int0  Mk+i/Mk-  Then  ker <p  =  Nk+\  fi  Mk  is  an  R  submodule  of  Mk  and  is 
hnitely  generated  by  the  inductive  hypothesis.  Also,  image  <p  is  an  R  submodule 
of  Mk+i/Mk,  which  is  singly  generated  with  generator  equal  to  the  coset  of 
nik+ 1-  Since  an  R  submodule  of  a  singly  generated  unital  R  module  was  shown 
in  the  previous  paragraph  to  be  finitely  generated,  image  <p  is  hnitely  generated. 
Applying  Lemma  8.23  to  cp,  we  see  that  Nk+i  is  hnitely  generated.  This  completes 
the  induction  and  the  proof.  □ 


9.  Integral  Closure 

In  this  section,  we  let  R  be  an  integral  domain,  F  be  its  held  of  fractions,  and 
A'  be  a  any  held  containing  F .  Sometimes  we  shall  assume  also  that  dim /-  K  is 
hnite.  The  main  cases  of  interest  are  as  follows. 

Examples  of  greatest  interest. 

(D*  =  Z,  F  =  Q,  and  dim/-  K  <  oo.  In  Chapter  IX  we  shall  see  in  this  case 
from  the  “Theorem  of  the  Primitive  Element”  that  K  is  necessarily  of  the  form 
Q[0]  as  already  described  in  Section  1  and  in  Chapter  IV.  This  is  the  setting  we 
used  in  Section  7  as  orientation  for  certain  problems  in  algebraic  number  theory. 

(2)  R  =  K[X]  for  a  held  K,  F  =  K(X)  is  the  held  of  fractions  of  R ,  and 
K  is  a  held  containing  F  with  dim/?  K  <  oo.  In  the  special  case  K  =  C,  this 
is  the  setting  we  used  in  Section  7  as  orientation  for  treating  curves  in  algebraic 
geometry. 
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Proposition  8.35.  Let  R  be  an  integral  domain,  F  be  its  field  of  fractions,  and 
K  be  any  field  containing  F.  Then  the  following  conditions  on  an  element  x  of 
K  are  equivalent: 

(a)  x  is  a  root  of  a  monic  polynomial  in  R[X], 

(b)  the  subring  R[x]  of  K  generated  by  R  and  x  is  a  finitely  generated  R 
module, 

(c)  there  exists  a  finitely  generated  nonzero  unital  R  module  M  c  K  such 
that  xM  C  M. 

Remark.  When  the  equivalent  conditions  of  the  proposition  are  satisfied, 
we  say  that  x  is  integral  over  R  or  x  is  integrally  dependent  on  R.  In  this 
terminology,  in  Section  VII.  5  and  in  Section  1  of  the  present  chapter,  we  defined  an 
algebraic  integer  to  be  any  member  of  C  that  is  integral  over  Z.  The  equivalence 
of  (a)  and  (c)  in  this  setting  allowed  us  to  prove  that  the  set  of  algebraic  integers 
is  a  subring  of  C. 

Proof.  If  (a)  holds,  we  can  write  xn  +  an- ix"-1  +  •  •  •  +  ci\x  +  ao  =  0 
for  suitable  coefficients  in  R.  Solving  for  x"  and  substituting,  we  see  that  the 
subring  R[x],  which  equals  R  +  Rx  +  Rx2  +  •  •  • ,  is  actually  given  by  R\x  \  = 
R  +  Rx  +  •  •  •  +  Rx"-1.  Therefore  R\x ]  is  a  finitely  generated  R  module,  and 
(b)  holds. 

If  (b)  holds,  then  we  can  take  M  =  R[x]  to  see  that  (c)  holds. 

If  (c)  holds,  let  mi , ... ,  m k  be  generators  of  M  as  an  R  module.  Then  we  can 
find  members  a,-  ;  of  R  for  which 

xmi  =  anmi  +  •  •  •  +  ik, 


xmk  =  ak\m\  H - t-  akkmk. 


This  set  of  equations,  regarded  as  a  single  matrix  equation  over  K ,  becomes 


/ 


x-a  ii 
—on 


—an 

x-a22 


~a\k  \ 
~Cl2k  \ 


V 


—Okl 


—ati 


’c-atk  ) 


mk 


The  k-by-k  matrix  on  the  left  is  therefore  not  invertible,  and  its  determinant,  which 
is  a  member  of  the  field  K,  must  be  0.  Expanding  the  determinant  and  replacing 
x  by  an  indeterminate  X,  we  obtain  a  monic  polynomial  of  degree  k  in  R[X]  for 
which  x  is  a  root.  Thus  (a)  holds.  □ 


If  R,  F,  and  K  are  as  above,  the  integral  closure  of  R  in  K  is  the  set  of  all 
members  of  K  that  are  integral  over  R.  In  Corollary  8.38  we  shall  prove  that  the 
integral  closure  of  R  in  K  is  a  subring  of  K. 
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Examples  of  integral  closures. 

( 1 )  The  integral  closure  of  Z  in  Q  is  Z  itself.  This  fact  amounts  to  the  statement 
that  a  rational  root  of  a  monic  polynomial  with  integer  coefficients  is  an  integer; 
this  was  proved10  in  the  course  of  Lemma  7.30.  Recall  the  argument:  If  x  =  p/q 
is  a  rational  number  in  lowest  terms  that  satishesx"+a,„_iA"-1 +•  •  -+a\a+ao  = 
0,  then  weclear  fractions  and  obtain  p"  +a„-\p"~lq  +  -  ■  ■+aipqn~1  +aoq"  =  0. 
Examining  divisibility  by  q,  we  see  that  q  divides  pn.  Hence  any  prime  factor  of 
q  divides  p  and  shows  that  p/q  cannot  be  in  lowest  terms.  Therefore  q  has  no 
prime  factors,  and  p/q  is  an  integer. 

(2)  Let  us  determine  the  integral  closure  of  Z  in  Q(^/m ),  where  m  is  a  square- 
free  integer  other  than  0  or  1 .  The  result  is  going  to  be  that  the  integral  closure 
consists  of  all  a  +  bs[m  with 


1  both  in  Z 

a  and  b  \  , 

l  both  in  Z  or  both  in  Z  +  ^ 

In  other  words,  the  integral  closure  is 


if  m  ^  1  mod  4, 
if  m  =  1  mod  4. 


1  Z [s/m  ]  if  m  ^  1  mod  4, 

1  Z[y(l  +  s/m  )]  ifnz  =  lmod4. 

In  fact,  consider  the  polynomial 


P(X)  =  X2  -  2a X  +  ( a 2  -  mb2), 


(*) 


whose  roots  are  exactly  a  ±  by/m.  If  a  and  b  are  in  Z,  then  P(X)  has  coefficients 
in  Z,  and  hence  both  of  a  ±  bsfm  are  in  the  integral  closure.  If  m  =  1  mod  4  and 
a  and  b  are  both  in  Z  +  write  a  =  c/2  and  b  =  d/2  with  c  and  cl  in  2Z  +  1 . 
Since  a 2  —  mb 2  =  |(c2  —  md 2),  we  have 

c2  —  md 2  =  c2  —  d2  mod  4=1  —  1  mod  4  =  0  mod  4, 

and  therefore  |  (c2  —  md2)  =  a2  —  mb2  is  in  Z.  Consequently  the  polynomial 
P(X)  exhibits  a  +  hsjln  as  in  the  integral  closure. 

For  the  reverse  inclusion,  suppose  that  z  =  a  +  bs/m  is  in  the  integral  closure 
and  is  not  in  Z.  Then  z  is  a  root  of  some  monic  polynomial  A(X)  in  Z[X\. 
In  addition,  z  is  a  root  of  P(X)  above,  and  P(X)  is  a  monic  prime  polyno¬ 
mial  in  Qi  X  ]  because  it  has  no  rational  first-degree  factor.  Writing  A(X )  = 
B(X)P(X)  +  R(X)  in  Q[X]  with  R(X)  =  0  or  deg  R(X)  <  deg  P(X)  =  2  and 

10It  is  not  assumed  that  the  reader  has  looked  at  Chapter  VII.  A  result  that  implies  Lemma  7.30 
will  be  obtained  below  as  Corollary  8.38,  which  makes  no  use  of  material  from  Chapter  VII. 
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substituting  z  for  X ,  we  see  that  R(z)  =  0,  and  we  conclude  that  R(X)  =  0. 
Thus  P(X)  divides  A{X).  By  Corollary  8.20c,  P(X)  is  in  Z[X\.  Hence  2 a  and 
a 2  —  mb 2  are  in  Z.  One  case  is  that  a  is  in  Z,  and  then  mb2  is  in  Z;  since  m  is 
square  free,  there  are  no  candidates  for  primes  dividing  the  denominator  of  b,  and 
so  b  is  in  Z.  The  other  case  is  that  a  is  in  Z  +  f ,  and  then  mb2  is  in  Z  +  f .  So 
m(2b)2  is  in  4Z  +  1.  Since  m  is  square  free,  there  are  no  candidates  for  primes 
dividing  the  denominator  of  2b,  and  2b  is  an  integer.  Since  in  (2b)2  is  in  4Z  +  1, 
m  =  1  mod  4  and  2b  =  1  mod  2  are  forced.  This  completes  the  proof  that  the 
integral  closure  is  given  by  (*). 

(3)  Under  the  assumption  that  the  characteristic  of  the  field  K  is  not  2,  let 
us  determine  the  integral  closure  T  of  R  =  K[x]  in  K  =  K(x)|  ^ P(x)  \  = 
K(x)[y]/(y2  —  P(jc)),  where  P(X)  is  a  square-free  polynomial  in  K[x].  Par¬ 
enthetically  we  need  to  check  that  A'  is  a  field.  Since  K(x)  is  a  field,  K(x)[y  | 
is  a  principal  ideal  domain,  and  the  question  is  whether  (y2  —  P  (x ) )  is  a  prime 
( =  maximal)  ideal.  We  have  only  to  observe  that  y2  —  P(x)  is  irreducible  because 
P(x)  is  not  a  square,  and  then  it  follows  that  A"  is  a  field.  Thus  the  situation  for 
this  example  fits  the  setting  of  Proposition  8.35  with  R  =  K[x],  F  =  K(x),  and 
K  =  F(y)/(y2  —  P).  We  are  going  to  show  that  the  integral  closure  T  of  R  in 
K  consists  of  all  A(x)  +  B(x)~J P(x)  with  A (x )  and  B(x)  both  in  R  =  K[x  ].  It 
follows  that  the  integral  closure  will  be 

T  =  K[x][/P(x)  ]  =  K[x]  +  K[x]y/P(x).  (*) 

To  see  this,  first  let  A(x)  and  B(x)  be  in  K[x  ],  and  consider  the  monic  polynomial 

GOO  =y2-  2 Ay  +  (A2  -  PB2)  (**) 

in  K[x  ][>’]■  Its  roots  in  K  are  exactly  A  (x )  ±  B  (x )  y7  P  (x ) ,  and  thus  we  see  that 
both  of  A{x)  ±  B(X)P(x)  are  in  T .  Conversely  let  z  =  A(x)  +  B(x)*jP(x)  be 
in  T  but  not  R.  Here  A(x)  and  B(x)  are  in  K(jc).  Then  z  is  a  root  in  K  of  some 
monic  polynomial  M(y)  whose  coefficients  are  in  K[x].  In  addition,  z  is  a  root 
of  the  member  Q(y)  of  K(x)[y]  defined  in  {**).  The  division  algorithm  gives 
M(y )  =  N{y)Q{y)  +  W(y)  in  K(x)[y]  with  VP  =  0  or  deg  W  <  deg  Q  =  2. 
Substituting  z  G  T  for  y,  we  obtain 

0  =  M(z)  =  N(z)Q(z)  +  W(z )  =  N(z)0+  W(z). 

Thus  W(z)  =  0.  If  deg  W  =  1,  then  z  is  in  F,  and  the  same  argument  as  in 
Example  1  shows  that  z  is  in  A;  since  we  are  assuming  that  z  is  not  in  R,  we 
conclude  that  W  =  0.  Therefore  Q(y)  divides  M(y).  By  Corollary  8.20c,  M(y) 
is  in  K[x][y].  Hence  2A  and  A2  —  PB 2  are  in  K[x],  Since  the  characteristic  of 
IK  is  not  2,  A  is  in  K[x].  Then  PB 2  is  in  K[x],  and  B  must  be  in  K[x]  since  P  is 
square  free.  Thus  T  is  given  as  in  (*). 
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From  these  examples  we  can  extract  a  rough  description  of  the  situation  that 
will  interest  us.  We  start  with  a  ring  R  such  as  Z  or  K[x],  along  with  its  held 
of  fractions  F .  We  assume  that  the  integral  closure  of  R  in  F  is  R  itself,  as  is 
the  case  with  Z  in  Q  and  as  we  shall  see  is  the  case  with  K[x]  in  K(x).  Let  K 
be  a  held  containing  F  with  dim/  K  <  oc.  We  are  interested  in  an  analog  T  of 
integral  elements  relative  to  K ,  and  what  works  as  7  is  the  integral  closure  of  R 
in  K. 

Lemma  8.36.  If  A,  B,  and  C  are  integral  domains  with  AcficC  such  that 
C  is  a  hnitely  generated  B  module  and  B  is  a  hnitely  generated  A  module,  then 
C  is  a  hnitely  generated  A  module. 

PROOF.  Let  C  be  generated  over  B  by  c\ , ...  ,cr,  and  let  B  be  generated  over  A 
by  hi, . . . ,  bs.  Then  C  is  generated  over  A  by  the  sr  elements  bjCi  for  1  <  i  <  r 
and  1  <  j  <  s.  □ 

Proposition  8.37.  Let  R  be  an  integral  domain,  F  be  its  held  of  fractions,  and 
K  be  any  held  containing  F.  If  xj, . . . ,  xr  are  members  of  K  integral  over  R. 
then  the  subring  R[x\, ...,  xr\  of  K  generated  by  R  and  x\, ...  ,xr  is  a  hnitely 
generated  R  module. 

Remarks.  The  ring  7?[xi, . . . ,  xr\  is  certainly  hnitely  generated  over  R  as  a 
ring.  The  proposition  asserts  more— that  it  is  hnitely  generated  as  an  R  module. 
This  means  that  all  products  of  powers  of  the  xf  s  are  in  the  R  linear  span  of 
hnitely  many  of  them. 

PROOF.  We  induct  on  r.  Since  x\  is  assumed  integral  over  R,  the  case  r  =  1 
follows  from  Proposition  8.35b.  For  the  inductive  step,  suppose  that  7?  [xi, . . . ,  xs] 
is  a  hnitely  generated  R  module.  Since  xs+\  is  integral  over  R.  it  is  certainly 
integral  over  R[x\, . . . ,  xs].  Thus  Proposition  8.35b  shows  that  R ( x i , . . . ,  xs+i] 
is  a  hnitely  generated  R[x\ . . . . ,  xs  |  module.  Taking  A  =  R.  B  =  7?[xi, . . . ,  xs], 
and  C  =  7?[x, , . . . ,  xi+i]  in  Lemma  8.36,  we  see  that  R[x\, . . . ,  x.s+i]  is  a  hnitely 
generated  R  module.  □ 

Corollary  8.38.  Let  R  be  an  integral  domain,  F  be  its  held  of  fractions,  and 
K  be  any  held  containing  F.  Then  the  integral  closure  of  R  in  K  is  a  subring 
of  K. 

Remark.  A  special  case  of  this  corollary  appears  in  somewhat  different 
language  as  Lemma  7.30. 

PROOF.  Let  x  and  y  be  integral  over  R.  Then  R  ( x ,  y  |  is  a  hnitely  gener¬ 
ated  R  module  by  Proposition  8.37.  We  have  (x  ±  y)R\x.  y  ]  C  R[x,  v]  and 
(xy)7?[x,  )’ ]  C  R\x.  y].  Taking  M  =  7?[x,  y  ]  in  Proposition  8.35c  and  using  the 
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implication  that  (c)  implies  (a)  in  that  proposition,  we  see  that  x  ±  y  and  xy  are 
integral  over  R.  □ 

Corollary  8.39.  Let  A,  B,  and  C  be  integral  domains  with  A  C  B  c  C.  If 
every  member  of  B  is  integral  over  A  and  if  every  member  of  C  is  integral  over 
B,  then  every  member  of  C  is  integral  over  A. 

PROOF.  Let  K  be  the  field  of  fractions  of  C,  and  regard  C  as  a  subring  of 
K.  If  -v  is  in  C,  then  x  is  a  root  of  some  monic  polynomial  with  coefficients  in 
B,  say  x'1  +  £>„_ ix"_1  +  •  •  •  +  bo  =  0.  By  Proposition  8.37  the  subring  D  = 
A[bn- 1 , . . . ,  b(>\  of  C  i s  a  finitely  generated  A  module.  Since  x  is  integral  over  D, 
I)  |  x  ]  is  a  finitely  generated  D  module,  by  a  second  application  of  Proposition  8.37. 
Lemma  8.36  shows  that  D ( x  ]  is  a  finitely  generated  A  module.  By  Proposition 
8.35,  x  is  integral  over  A.  □ 

We  say  that  the  integral  domain  R  is  integrally  closed  if  R  equals  its  integral 
closure  in  its  field  of  fractions.  Example  1  above  in  essence  observed  that  the 
ring  Z  of  integers  is  integrally  closed.  Example  2  above  showed,  for  the  case 
m  =  —  3,  that  the  integral  closure  of  Z  in  Q[\/— 3  ]  is  something  other  than  the 
ring  Z[  v/— 3  |:  consequently  Z[V— 3  ]  cannot  be  integrally  closed.  A  more  direct 
argument  is  to  observe  that  the  element  x  =  ^  (—  1  +  \/— 3 )  of  Q[V— 3  ]  satisfies 
x2  +  x  +  1  =0  but  is  not  in  Z[V— 3  ]. 

Corollary  8.40.  Let  R  be  an  integral  domain,  F  be  its  field  of  fractions,  and 
K  be  any  field  containing  F.  Then  the  integral  closure  T  of  R  in  K  is  integrally 
closed. 

PROOF.  Corollary  8.38  shows  that  T  is  a  subring  of  K.  Let  C  be  the  integral 
closure  of  T  in  K.  We  apply  Corollary  8.39  to  the  integral  domains  R  C  T  C  C. 
The  corollary  says  that  every  member  of  C  is  integral  over  R.  and  hence  C  c  7'. 
That  is,  C  =  T .  Let  ij  :  T  — »■  L  be  the  one-one  homomorphism  of  T  into  its 
field  of  fractions,  and  let  tp  :  T  — »■  K  be  the  inclusion.  By  Proposition  8.6,  there 
exists  a  unique  ring  homomorphism  tp  :  L  — »■  K  such  that  <p  =  i prj.  Identifying 
L  with  <p(L)  C  K,  we  can  treat  L  as  a  subheld  of  K  containing  7  .  Since  the  only 
elements  of  K  integral  over  T  have  been  shown  to  be  the  members  of  T ,  the  only 
elements  of  the  subheld  L  integral  over  T  are  the  members  of  T .  Therefore  T  is 
integrally  closed.  □ 

Proposition  8.41.  If  R  is  a  unique  factorization  domain,  then  R  is  integrally 
closed. 

PROOF.  Suppose  that  y-1x  is  a  member  of  the  held  of  fractions  F  of  R,  with 
x  and  y  in  R  and  y  ^  0,  and  suppose  that  y~lx  satishes  the  equation 

(y-1x)"  +an-  i(y-1x)"-1  H - baiCy"1^)  +  a0  =  0 
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with  coefficients  in  R.  Clearing  fractions  and  moving  x"  over  to  one  side  by 
itself,  we  have 

xn  =  —y(an- \xn~x  4 - h  ci\xyn~2  +  a0yn~l). 

If  a  prime  p  in  R  divides  y,  then  it  divides  x"  and  must  divide  x.  If  R  is  a  unique 
factorization  domain,  this  says  that  we  cannot  arrange  for  GCD(x,  y)  to  equal  1 
unless  no  prime  divides  y.  In  this  case,  y  is  a  unit  in  R.  Consequently  y_1x  is 
in  R.  □ 

Since  Z  is  a  unique  factorization  domain,  Proposition  8.41  gives  a  new  proof 
that  Z  is  integrally  closed.  We  see  also  that  K[je]  is  integrally  closed  when  K  is 
a  field. 

We  saw  above  that  the  ring  Z[</— 3  ]  is  not  integrally  closed;  consequently  it 
cannot  be  a  unique  factorization  domain.  Another  way  of  drawing  this  conclusion 
is  to  verify  in  the  equality  (14-  \T~ 3 )  ( 1  —  \f~?> )  =  2  •  2  that  the  two  elements 
on  the  left  are  irreducible  and  are  not  associates  of  the  irreducible  element  2  on 
the  right. 

A  more  significant  example,  taking  advantage  of  the  contrapositive  of  Propo¬ 
sition  8 .4 1 ,  is  that  any  polynomial  ring  K[ X\ , . . . ,  X„  ]  over  a  field  K  is  integrally 
closed.  In  fact,  we  know  from  Section  5  that  K[Xi , . . . ,  Xn]  has  unique  factor¬ 
ization. 

Proposition  8.42.  Let  R  be  an  integral  domain,  F  be  its  field  of  fractions,  and 
K  be  any  field  containing  F .  If  dim/  K  <  oo,  then  any  x  in  K  has  the  property 
that  there  is  some  c  ^  0  in  R  such  that  cx  is  integral  over  R. 

Remarks.  Consequently  K  may  be  regarded  as  the  field  of  fractions  of  the 
integral  closure  T  of  R  in  K.  In  fact,  let  (x, }  be  a  basis  of  K  over  F,  and  choose 
d  ^4  0  in  R  for  each  i  such  that  y,  =  c,Xj  is  integral  over  R.  Then  {  v;- }  is  a  basis 
for  K  over  F  consisting  of  members  of  T,  and  it  follows  that  every  member  of 
K  is  the  quotient  of  a  member  of  T  by  a  member  of  R.  Proposition  8.6  supplies 
a  one-one  ring  homomorphism  of  the  field  of  fractions  for  T  into  K,  and  the 
description  just  given  for  the  elements  of  K  shows  that  this  homomorphism  is 
onto  K.  Therefore  K  may  be  regarded  as  the  field  of  fractions  of  74 

PROOF.  Since  dinv  K  <  oo,  the  elements  I .  x.  x2,  ...  of  A"  are  linearly 
dependent  over  F.  Therefore  anxn  +  ■  ■  ■  +  a\x  +  Oq  =  0  for  a  suitable  n  and 
for  suitable  members  of  F  with  an  /  0.  Clearing  fractions,  we  may  assume  that 
an, . . . ,  oi,  ao  are  in  R  and  that  a„  0.  Multiplying  the  equation  by  a"~\  we 
obtain 

(anx)n  4-  an-\{anx)n~x  4 - h  aia"-2(u„x)  4-  aoa'7~'  =  0. 


Thus  we  can  take  c  =  a„. 


□ 
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In  the  base  rings  Z  and  K[x]  of  our  examples,  every  nonzero  prime  ideal  is 
maximal  because  the  rings  are  principal  ideal  domains.  In  Section  7  we  mentioned 
that  every  nonzero  prime  ideal  in  Z[  5  ]  is  maximal  even  though  Z[-\/— 5  ]  is  not 
a  principal  ideal  domain.  The  remainder  of  this  section,  particularly  Proposition 
8.45,  shows  that  the  feature  that  every  nonzero  prime  ideal  is  maximal  is  always 
preserved  in  our  passage  from  R  to  T. 

Proposition  8.43.  Let  R  be  an  integral  domain,  F  be  its  field  of  fractions,  K 
be  any  field  containing  F,  and  T  be  the  integral  closure  of  R  in  K.  If  Q  is  a 
nonzero  prime  ideal  of  T,  then  P  =  R  D  Q  is  a  nonzero  prime  ideal  of  R. 

Remarks.  Corollary  8.38  shows  that  T  is  a  ring.  A  construction  for  prime 
ideals  that  goes  in  the  reverse  direction,  from  R  to  T,  appears  below  as  Proposition 
8.53. 

PROOF.  Let  Q  be  a  nonzero  prime  ideal  of  T,  and  put  P  =  R  fl  Q.  The  ideal 
P  is  proper  since  1  is  not  in  Q  and  cannot  be  in  P .  It  is  prime  since  xy  e  P 
implies  that  xy  is  in  Q,  x  or  y  is  in  Q ,  and  x  or  y  is  in  R  D  Q  =  P.  To  see  that 
P  is  nonzero,  take  t  ^  0  in  Q.  Since  t  is  integral  over  R,  t  satisfies  some  monic 
polynomial  equation  tn  +  an-\tn~x  +  •  •  •  +  a\t  +  a0  =  0  with  coefficients  in  R. 
Without  loss  of  generality,  ao  0  since  otherwise  we  could  divide  the  equation 
by  a  positive  power  of  t.  Then  ao  =  —  an-\tn~ 2  —  ■  ■  ■  —  ci\)  exhibits  qq 

as  in  Q  as  well  as  in  R.  Thus  P  is  nonzero.  □ 

Lemma  8.44.  Let  R  and  T  be  integral  domains  with  R  C  7  and  with  every 
element  of  T  integral  over  R.  If  T'  is  an  integral  domain  and  tp  :  T  — >■  T'  is  a 
homomorphism  of  rings  onto  T' ,  then  every  member  of  T  is  integral  over  <p(R). 

PROOF.  If  t  is  in  T,  then  t  satisfies  some  monic  polynomial  equation  of  the 
form  t"  +a„-\tn~l  +  •  •  •  +  a\t  +ao  =0  with  coefficients  in  R.  Applying  <p  to  this 
equation,  we  see  that  <p(t)  satisfies  a  monic  polynomial  equation  with  coefficients 
in  <p(R).  □ 

Proposition  8.45.  Let  R  be  an  integral  domain,  F  be  its  field  of  fractions, 
K  be  any  field  containing  F,  and  7  be  the  integral  closure  of  R  in  K.  If  every 
nonzero  prime  ideal  of  R  is  maximal,  then  every  nonzero  prime  ideal  of  T  is 
maximal. 

Remark.  As  with  Proposition  8.43,  Corollary  8.38  shows  that  T  is  a  ring. 

Proof.  Let  Q  be  a  nonzero  prime  ideal  in  T,  and  let  P  =  R  n  Q. 
Since  P  is  a  nonzero  prime  ideal  of  R  by  Proposition  8.43,  the  hypotheses  say  that 
P  is  maximal  in  R.  We  shall  apply  Lemma  8.44  to  the  quotient  homomorphism 
T  — >  T/Q.  The  lemma  says  that  every  element  of  the  integral  domain  T/Q  is 
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integral  over  the  subring  ( R  +  Q)/ Q.  Composing  the  inclusion  homomorphism 
R  — »■  T  with  the  homomorphism  T  — >  T  /  Q  yields  a  ring  homomorphism 
R  — »■  T/Q  that  carries  P  into  the  0  coset.  Since  P  =  R  D  Q,  this  ring 
homomorphism  descends  to  a  one-one  ring  homomorphism  R/P  — >  T/Q.  The 
Second  Isomorphism  Theorem  (for  abelian  groups)  identifies  the  image  of  R/P 
with  (R  +  Q)/Q.  Since  P  is  maximal  as  an  ideal  in  R.  R/P  is  a  field.  The 
ring  isomorphism  R/ P  =  (R  +  Q)/Q  thus  shows  that  every  element  of  T/Q  is 
integral  over  a  field. 

Let  us  write  k  for  this  field  isomorphic  to  R/P,  and  let  k'  be  the  field  of  fractions 
of  T/Q.  We  can  now  argue  as  in  the  proof  of  Proposition  4.1.  If.r  ^Oisin  T/Q , 
then  .r  satisfies  a  monic  polynomial  equation  x'n  +cm-\xm~l  +•  •  •  +  c  \  x  +  co  =  0 
with  coefficients  in  k,  and  we  may  assume  that  co  i=-  0.  Then  the  equality 
x_1  =  —  Cq  1  (ci  +  •  •  •  +  am_ix"*-2  +  xm~l)  shows  that  the  member  x_1  of  k'  is 
in  fact  in  T/Q.  Therefore  T/Q  is  a  field,  and  the  ideal  Q  is  maximal  in  T.  □ 


10.  Localization  and  Local  Rings 

In  this  section,  R  denotes  a  commutative  ring  with  identity.  The  objective  is  to 
enlarge  or  at  least  adjust  R  so  as  to  make  further  elements  of  R  become  invertible 
under  multiplication.  The  prototype  is  the  construction  of  the  field  of  fractions 
for  an  integral  domain.  A  subset  S  of  R  is  called  a  multiplicative  system  if  1 
is  in  S  and  if  the  product  of  any  two  members  of  S  is  in  S.  The  multiplicative 
system  will  be  used  as  a  set  of  new  allowable  denominators,  and  the  new  ring  will 
be  denoted11  by  S~l  R. 

The  construction  proceeds  along  the  same  lines  as  in  Section  2,  except  that 
some  care  is  needed  to  take  into  account  the  possibility  of  zero  divisors  in  R  and 
even  in  S.  We  begin  with  an  intermediate  set 

R  =  {(r,  s)  |  r  e  R,  s  e  5} 

and  impose  the  relation  (r,  s)  ~  (rr,  s')  if  t  (rs'  —  sr')  =  0  for  some  t  e  S.  To 
check  transitivity,  suppose  that  (r,  s)  ~  (r',  s')  and  (r',  s')  ~  (r",  s").  Then  we 
have  t  {rs'  —  sr')  =  0  and  t'(r's"  —  s'r")  =  0  for  some  t  and  t'  in  S ,  and  hence 

s'tt'(rs"  —  sr")  =  s"t'(t(rs'  —  sr'))  +  st(t'{r' s"  —  s'r"))  =  0. 

Since  s'tt'  is  in  S,  (r,  5)  ~  (r",  s").  Thus  ~  is  an  equivalence  relation. 


1 1  Some  authors  write  Rs  instead  of  S  1 R. 
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The  set  of  equivalence  classes  is  denoted  by  S~l  R  and  is  called  the  localiza¬ 
tion12  of  R  with  respect  to  S.  Addition  and  multiplication  are  defined  in  R  by 
(r,  s)  +  O',  s')  =  ( rs ’  +  sr',  ss')  and  (r,  s)(r',  s')  =  (, rr ’ ,  ss').  Simple  variants 
of  the  arguments  in  Section  2  show  that  these  operations  descend  to  operations 
on  S-1  R.  For  example,  with  addition  let  (r,  .s),  (r',  s'),  and  (r" ,  s ")  be  in  R  with 
(r',s')  ~  (; r",s "),  i.e.,  with  t'(r's"  —  s'r")  =  0  for  some  t'  e  S.  Then  the 
equivalence 

(r,  s)  +  (; r ' ,  s')  =  ( rs '  +  sr',  ss')  ~  (rs"  +  sr",  ss")  =  (r,  s)  +  (r" ,  s") 
holds  because 

t'((rs'  +  sr')ss"  —  (rs"  +  sr")ss')  =  s2t'(r's"  —  s'r")  =  0. 

Similarly  multiplication  is  well  defined. 

The  result  is  that  S~ 1 R  is  a  commutative  ring  with  identity  and  that  the  mapping 
r  m*-  r*,  where  r*  is  the  class  of  (r,  1),  is  a  ring  homomorphism  of  R  into  S~ 1  R 
carrying  1  to  1.  Let  us  observe  the  following  simple  properties  of  S~l  R: 

(i)  S~l  R  =  0  if  and  only  if  0  is  in  S,  since  S~l  R  =  0  if  and  only  if 
(1,  1)  ~  (0,  1),  if  and  only  if  t  (1  •  1  —  1  •  0)  =  0  for  some  t  e  S. 

(ii)  r  i-^  r*  is  one-one  if  and  only  if  S  contains  no  zero  divisors,  since  r*  =  0 
if  and  only  if  (r,  1)  ~  (0,  1),  if  and  only  if  tr  =  0  for  some  t  e  S. 

(iii)  5*  is  a  unit  in  S~l  R  for  each  s  e  S,  since  the  class  of  (1,5)  is  a  multi¬ 
plicative  inverse  for  5*. 

(iv)  every  memberof  S~l R  is  oftheform  (5*)-1r*  for  somer  €  R  and .s  e  S, 
since  (r,  s)  =  (r,  1)(1,  s)  is  the  class  of  r*(5*)-1. 

(v)  S-1  R  is  an  integral  domain  if  R  is  an  integral  domain  and  0  is  not  in  S. 

In  working  with  localizations,  we  shall  normally  drop  the  superscript  *  on  the 
image  r*  in  S-1  R  of  an  element  r  of  R. 

Localizations  arise  in  algebraic  number  theory  and  in  algebraic  geometry.  In 
applications  to  algebraic  number  theory,  the  ring  R  typically  is  an  integral  domain, 
and  therefore  the  map  r  i->-  r*  is  one-one.  In  applications  to  algebraic  geometry, 
S  may  have  zero  divisors. 

Examples  of  localizations. 

(1)  R  is  arbitrary,  and  S  =  {1}.  Then  1  R  =  R. 

12Some  authors  use  a  term  like  "ring  of  fractions”  or  “ring  of  quotients”  in  connection  with 
localization  in  the  general  case  or  in  some  special  cases.  We  shall  not  use  these  terms.  In  any  event, 
"ring  of  quotients”  is  emphatically  not  to  be  confused  with  "quotient  ring"  as  in  Chapter  IV,  which 
is  the  coset  space  of  a  ring  modulo  an  ideal. 
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(2)  R  is  arbitrary,  and  S  =  {nonzero  elements  that  are  not  zero  divisors  in  R}. 
Then  every  nonzero  element  of  S~ 1 R  is  a  zero  divisor  or  is  a  unit.  In  this  example 
when  S  consists  of  all  members  of  R  other  than  0,  then  R  is  an  integral  domain 
and  S~l  R  is  the  held  of  fractions  of  R. 

(3)  R  is  arbitrary,  P  is  a  prime  ideal  in  R.  and  S  is  the  set-theoretic  complement 
of  P.  The  identity  is  in  S  since  P  is  proper.  The  prime  nature  of  P  is  used  in 
checking  that  S  is  a  multiplicative  system:  if  s  and  t  are  in  S ,  then  neither  is  in 
P.  by  definition,  and  their  product  st  cannot  be  in  P  since  P  is  prime;  thus  the 
product  st  is  in  S.  With  these  definitions, 

S~ 1 R  is  often  denoted  by  RP 

and  is  called  the  localization  of  R  at  the  prime  P.  In  practice  this  is  the  most 
important  example  of  a  localization,13  directly  generalizing  the  construction  of 
the  held  of  fractions  of  an  integral  domain  as  the  localization  at  the  prime  ideal  0. 
Here  are  some  special  cases,  K  being  a  held  in  the  cases  in  which  it  occurs: 

(a)  When  R  =  Z  and  P  =  (p)  for  a  prime  number  p,  the  set  S  consists 
of  nonzero  integers  not  divisible  by  p ,  and  Rp  is  the  subset  of  all  members  of  Q 
whose  denominators  are  not  divisible  by  p. 

(b)  When  R  =  K[X]  and  P  =  (X  —  c),  the  set  S  consists  of  all  polynomials 
that  are  nonvanishing  at  c,  and  Rp  is  the  set  of  formal  rational  expressions  in  X 
that  are  hnite  at  c. 

(c)  When  R  =  K[X,  Y]  and  P  =  (X  —  c,Y  —  d),  the  set  S  consists  of 
all  polynomials  in  X  and  Y  that  are  nonvanishing  at  (c,  d),  and  Rp  is  the  set  of 
formal  rational  expressions  in  X  and  Y  that  are  hnite  at  (c,  d). 

(d)  When  R  =  K[X,  Y\  and  P  =  (X),  the  set  S  consists  of  all  polynomials 
in  X  and  Y  that  are  not  divisible  by  X,  and  RP  is  the  set  of  formal  rational 
expressions  in  X  and  Y  that  are  meaningful  as  rational  expressions  in  Y  when  X 
is  set  equal  to  0.  For  example,  1  /(X  +  Y)  is  in  Rp,  but  1/X  is  not. 

(4)  R  is  arbitrary,  ( Pa]  is  a  nonempty  collection  of  prime  ideals,  and  S  is  the 
set  of  all  elements  of  R  that  lie  in  none  of  the  ideals  Pa.  Then  S~l  R  may  be 
regarded  as  the  localization  of  R  at  the  set  of  all  primes  P„ . 

(5)  R  is  arbitrary,  u  is  an  element  of  R,  and  S  =  {1,  u,  u2, . . . }.  For  example, 
if  R  =  Z/( p2),  where  p  is  a  prime,  and  if  u  =  p,  then  0  is  in  .S',  and  observation 
(i)  shows  that  S~l  R  =0. 

(6)  R  is  a  Noetherian  integral  domain,  E  is  an  arbitrary  set  of  nonzero  elements 
of  R ,  and  S  is  the  set  of  all  hnite  products  of  members  of  E,  including  the  element 

13 Beware  of  confusing  R p  with  R/P.  The  ring  Rp  is  obtained  by  suitably  enlarging  R,  at  least 
in  the  case  that  R  is  an  integral  domain,  whereas  the  ring  R/P  is  obtained  by  suitably  factoring 
something  out  from  R. 
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1  as  the  empty  product.  Let  us  see  that  the  same  S~ 1 R  results  when  E  is  replaced 
by  a  certain  set  E'  of  units  and  irreducible  elements  of  R.  namely  the  union 
of  Rx  and  the  set  of  all  irreducible  elements  x  in  R  such  that  x~x  is  in  S~ 1  R. 
Define  T  to  be  the  set  of  all  finite  products  of  members  of  E' .  We  show  that 
S~x  R  =  T~l R.  If  e  is  in  E',  then  either  e  is  a  unit  in  R,  in  which  case  e~x 
lies  in  R  and  therefore  also  5“' R,  or  e  is  irreducible  in  R  with  e~x  in  S~x R. 
Passing  to  finite  products  of  members  of  E',  we  see  that  T~x  c  S  ]  R.  Hence 
T~XR  C  S~XR.  Now  let  s  be  in  S,  and  use  Proposition  8.33  to  write  s  as  a 
product  of  irreducible  elements  s  =  si  ■  ■  ■  s„.  Then  sj 1  =  s-1(si  •  •  •  s]  ■  ■  ■  s„), 
with  Tj  indicating  a  missing  factor.  By  construction,  each  Sj  is  in  E' .  Therefore 
each  sj  is  in  T,  and  s  is  in  T.  Consequently  S  c  T,  and  S~x  R  C  T~x  R. 

The  localization  of  R  at  S  is  characterized  up  to  canonical  isomorphism  by  the 
same  kind  of  universal  mapping  property  that  characterizes  the  field  of  fractions 
of  an  integral  domain.  To  formulate  a  proposition,  let  us  write  tj  for  the  homo¬ 
morphism  r  r*  of  R  into  S~x R.  Then  the  pair  (S~x R,  if)  has  the  universal 
mapping  property  stated  in  Proposition  8.46  and  illustrated  in  Figure  8.7. 

R 

1 1 

S~x  R 

FIGURE  8.7.  Universal  mapping  property  of  the  localization  of  R  at  S. 

Proposition  8.46.  Let  R  be  a  commutative  ring  with  identity,  let  S  be  a 
multiplicative  system  in  R,  let  S~x  R  be  the  localization  of  R  at  S,  and  let  i]  be  the 
canonical  homomorphism  of  R  into  S~x  R.  Whenever  <p  is  a  ring  homomorphism 
of  R  into  a  commutative  ring  T  with  identity  such  that  <p(  1)  =  1  and  such  that 
<p(s)  is  a  unit  in  T  for  each  s  e  S,  then  there  exists  a  unique  ring  homomorphism 
<p  :  S-1  R  — >■  T  such  that  <p  =  <prj. 

Proof.  If  (r,  s)  with  s  e  S  is  a  pair  in  R ,  we  define  d>(r,  5)  =  < p{r)cp{s)~x . 
This  is  well  defined  since  <p(s)  is  assumed  to  be  a  unit  in  T.  Let  us  see  that 
<f>  is  consistent  with  the  equivalence  relation,  i.e.,  that  (r,  ,v)  ~  (rr,  sr)  implies 
cp(r,  5)  =  <J>(/-',  s').  Since  (r,  s)  ~  (rr,  s'),  we  have  u{rs'  —  r's )  =  0  for  some 
u  £  S,  and  therefore  also  (p{u){(p{r)(p{s’)  —  (p{r’)(p{s))  =  0.  Since  (p{u)  is  a 
unit,  < p(r)cp(s ')  =  cp(r')<p(s ).  Hence  <F(r,  s)  =  (p(r)<p(s)~x  =  (p{r’)(p{s')~x  = 
Of r' ,  s' ),  as  required. 

We  can  thus  define  <p  of  the  class  of  (r.  s)  to  be  Ofr.  5),  and  cp  is  well  defined 
as  a  function  from  S~x  R  to  T .  It  is  a  routine  matter  to  check  that  <p  is  a  ring 
homomorphism.  If  r  is  in  R ,  then  tp{ij{r))  =  £>(class  of  (r,  1))  =  <F(r,  1)  = 


T 
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<p(r)cp( 1)_1,  and  this  equals  <p(r)  since  < p  is  assumed  to  carry  1  into  1.  Therefore 
tprj  =  i p. 

For  uniqueness,  observation  (iv)  shows  that  the  most  general  element  of  S~l  R 
is  of  the  form  with  r  e  R  and  s  e  S.  Since  ( tpi)){r )  =  tp{r) 

and  ( (pr/)(s )  =  tp(s),  we  must  have  tp(ri(r)r](s)~l)  =  (p(ii(r))(p(ii{s))~x  = 
<p(r)<p(s)_1-  Therefore  tp  uniquely  determines  tp.  □ 

We  shall  examine  the  relationship  between  ideals  in  R  and  ideals  in  the  local¬ 
ization  S~l  R.  If  I  is  an  ideal  in  R ,  then  S~]  I  =  {s_1z  |  s  e  S,  i  e  /}  is  easily 
checked  to  be  an  ideal  in  S~]  R  and  is  called  the  extension  of  /  to  S~ 1  R.  If  J 
is  an  ideal  in  S~l  R,  then  R  D  J ,  i.e.,  the  inverse  image  of  J  under  the  canonical 
homomorphism  t]  :  R  — >  S~l R,  is  an  ideal  in  R  and  is  called  the  contraction 
of  J . 

Proposition  8.47.  Let  R  be  a  commutative  ring  with  identity,  and  let  S~ 1 R 
be  a  localization.  If  J  is  an  ideal  in  S~l  R,  then  S~l(RC\  J)  =  J.  Consequently 
the  mapping  I  S~l  I  is  a  one-one  mapping  of  the  set  of  all  ideals  I  in  R  of 
the  form  I  =  R  Cl  J  onto  the  set  of  all  ideals  in  1  R.  and  this  mapping  respects 
intersections  and  inclusions. 

Remarks.  As  in  the  definition  of  contraction,  R  D  J  means  p-1(V),  where 
j]  :  R  — >  S~ 1  R  is  the  canonical  homomorphism.  The  map  /  1  /  that  carries 

arbitrary  ideals  of  R  to  ideals  of  S~'  R  need  not  be  one-one;  the  localization  could 
for  example  be  the  held  of  fractions  of  an  integral  domain  and  have  only  trivial 
ideals.  The  proposition  says  that  the  map  I  m>-  S  1  /  is  one-one,  however,  when 
restricted  to  ideals  of  the  form  I  =  R  D  J. 

PROOF.  From  the  facts  that  R  Cl  J  c  J  and  J  is  an  ideal  in  S~l R,  we  obtain 
S~l(R  Cl  /)  c  S~l  J  c  J.  For  the  reverse  inclusion  let  x  be  in  J ,  and  write 
x  =  s ~ 1  r  with  r  in  R  and  s  in  S.  Then  sx  =  r  is  in  R  Cl  J,  and  therefore  x  is  in 

s-Hrdj). 

For  the  conclusion  about  the  mapping  I  S~l I,  the  mapping  is  one-one 
because  S~l(R  fl  J i)  =  S~l(R  fl  J2)  implies  J\  =  Ji  by  what  we  have  just 
shown;  hence  R  fl  J\  =  R  fl  Ji-  The  mapping  is  onto  because  if  J  is  given, 
then  J  =  S~l(RC\  J )  by  what  has  already  been  shown.  To  see  that  the  mapping 
respects  the  intersection  of  ideals,  let  ideals  R  fl  Ja  be  given  for  a  in  some 
nonempty  set.  Then 

5-'(  Da  (R  n  J«))  =  S~\R  n  a  Ja)  =  f\  Ja  =  Ha  S~l{R  D  J„). 


Finally  the  fact  that  the  mapping  respects  the  intersection  of  two  ideals  implies 
that  it  respects  inclusions.  □ 
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Corollary  8.48.  Let  R  be  a  commutative  ring  with  identity,  and  let  S  1 R  be 
a  localization. 

(a)  If  R  is  Noetherian,  then  S~l  R  is  Noetherian. 

(b)  If  every  nonzero  prime  ideal  in  R  is  maximal,  then  the  same  thing  is  true 
in  S~lR. 

(c)  If  R  is  an  integral  domain  that  is  integrally  closed  and  if  S~ 1  R  is  not  zero, 
then  S~ 1 R  is  integrally  closed. 

(d)  If  I  is  an  ideal  in  R.  then  the  ideal  S~ 1  /  of  S~l  R  is  proper  if  and  only  if 

i  ns  =  0. 

PROOF.  For  (a),  let  { Ja }  be  a  nonempty  collection  of  ideals  in  S~ 1  R.  Con¬ 
traction  of  ideals  is  one-one  by  the  first  conclusion  of  Proposition  8.47,  and  it 
respects  inclusions  because  it  is  given  by  the  inverse  image  of  a  function.  Since  R 
is  Noetherian,  Corollary  8.31b  produces  a  maximal  element  R  n  J  from  among 
the  ideals  R  fl  Ja  of  R.  The  first  and  second  conclusions  of  Proposition  8.47 
together  show  that  J  =  S~x  (R  Pi  J)  0  S~l  (R  IT  Ja)  =  Ja  for  all  a.  Hence  J  is 
maximal  among  the  Ja . 

For  (b),  let  J\  be  a  nonzero  prime  ideal  in  S~x  R.  Arguing  by  contradiction, 
suppose  that  J2  is  an  ideal  in  S'  1  R  with  J\  C  J2  C  s~x  R.  Then  R  IT  J\  C 
R  H  J2  0  R-  If  either  of  these  inclusions  were  an  equality,  then  use  of  the  second 
conclusion  of  Proposition  8.47  would  give  a  corresponding  equality  for  /] ,  Jo-  R, 
and  there  is  no  such  equality.  Hence  R  n  J \  C  R  n  Ji'^  R. 

If  J\  is  prime  in  S~x R,  then  R  n  J 1  is  prime  in  R:  In  fact,  if  a  and  b  are 
members  of  R  such  that  ab  is  in  R  IT  J\ ,  then  ab  is  in  J \ ,  and  either  a  or  b  must  be 
in  J\  since  J\  is  prime.  Since  a  and  b  are  both  in  R ,  one  of  a  and  b  is  in  R  fl  J\ . 
Thus  R  (T  J\  is  prime.14 

By  assumption  for  (b),  R  IT  J\  is  then  maximal  in  R,  and  this  conclusion 
contradicts  the  fact  that  RH  J\  C  Rn  J2  C  The  assumption  that  Ji  exists  has 
thus  led  us  to  a  contradiction.  Consequently  there  can  be  no  such  J2 ,  and  J\  is  a 
maximal  ideal  in  S-1  R. 

For  (c),  let  F  be  the  field  of  fractions  of  R,  so  that  R  c  S-1  R  c  F.  The  field 
of  fractions  of  S~l  R  is  the  field  F  as  a  consequence  of  Proposition  8.6.  If  x  is  a 
memberof  F  that  is  integral  over  51-1  andif  x  satisfiesx'!+h„_ix'?-1 +•  •  -+bo  = 
0  with  coefficients  in  S~ 1 R,  then  we  can  find  a  common  element  .s  of  S  and  rewrite 
this  equation  as 

x"  +  (s~lan- i)x”-1  +  •  •  •  +  (s~xao)  =  0 
with  an- 1  ,...,«o  in  R.  Multiplying  by  s",  we  obtain 

(sx)n  +  an-i(sx)n~x  +  •  •  •  +  ais"~2(sx)  +  aosn~]  =  0. 

14Problem  9  at  the  end  of  the  chapter  puts  this  argument  in  a  broader  context. 
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Therefore  sx  is  integral  over  R.  Since  R  is  integrally  closed,  sx  is  in  R.  Write 
r  =  sx.  Then  x  =  s~lr  with  r  in  R  and  s  in  S.  Hence  x  is  exhibited  as  in  S~ 1  R. 
and  we  conclude  that  S-1  R  is  integrally  closed. 

For  (d),  suppose  that  I  D  S  is  nonempty.  If  .v  is  in  /  D  S,  then  1  =  s “ 1  .v  is 
in  S~l I  and  the  ideal  S~l  1  equals  S~l R.  Conversely  if  S~l I  =  S~ 1  R.  then  1 
is  in  S'-1/  =  {s-1/  |  s  e  S,  i  e  /},  and  hence  1  =  s~li  for  some  s  and  i ; 
consequently  IDS  contains  the  element  i  =  s.  □ 

A  local  ring  is  a  commutative  ring  with  identity  having  a  unique  maximal 
ideal.  An  equivalent  definition  is  given  in  Proposition  8.49  below,  and  then  it 
follows  that  the  localization  S~ 1 R  of  Example  2  earlier  in  this  section  is  a  local 
ring.  Corollary  8.50  below  will  produce  a  more  useful  example:  localization  with 
respect  to  a  prime  ideal,  as  in  Example  3  earlier,  always  yields  a  local  ring.15 

Proposition  8.49.  A  nonzero  commutative  ring  R  with  identity  is  a  local  ring 
if  and  only  if  the  nonunits  of  R  form  an  ideal. 

Remark.  The  zero  ring  is  not  local,  having  no  proper  ideals,  and  its  set  of 
nonunits  is  empty,  hence  is  not  an  ideal. 

PROOF.  If  the  nonunits  of  R  form  an  ideal,  then  that  ideal  is  a  unique  maximal 
ideal  since  a  proper  ideal  cannot  contain  a  unit;  hence  R  is  local.  Conversely 
suppose  that  R  is  local  and  that  M  is  the  unique  maximal  ideal.  If  x  is  any 
nonunit,  then  the  principal  ideal  (x)  is  a  proper  ideal  since  1  is  not  of  the  form  xr . 
By  Proposition  8.8,  (a)  is  contained  in  some  maximal  ideal,  and  we  must  have 
(x)  C  M  since  M  is  the  unique  maximal  ideal.  Then  x  is  in  M,  and  we  conclude 
that  every  nonunit  is  contained  in  M.  □ 

Corollary  8.50.  Let  R  be  an  integral  domain,  let  P  be  a  prime  ideal  of  R.  let 
S  be  the  set-theoretic  complement  of  P,  and  let  R/>  =  S~ 1  R  be  the  localization 
of  R  at  P.  Then  Rp  is  a  local  ring,  its  unique  maximal  ideal  is  M  =  S~l  P,  and 
P  can  be  recovered  from  M  as  P  =  R  D  M.  If  Q  is  any  prime  ideal  of  R  that  is 
not  contained  in  P,  then  S_1  Q  =  S~l  R. 

PROOF.  The  subset  S'-1  P  of  S~l  R  is  an  ideal  by  Proposition  8.47,  and  Corol¬ 
lary  8.48d  shows  that  it  is  proper.  Every  member  of  S~l  R  that  is  not  in  S~l  P 
is  of  the  form  s'~ 1  s  with  .v  and  s'  in  S  and  hence  is  a  unit.  Since  no  unit  lies  in 
any  proper  ideal,  S_1  R  has  M  =  S~l  P  as  its  unique  maximal  ideal,  and  S~1  R 
is  local  by  Proposition  8.49. 

15For  Example  3  with  R  =  K[X]  and  P  =  (X  —  c),  the  sense  in  which  the  ring  Rp  is  "local” 
has  a  geometric  interpretation:  the  only  spot  in  IK  where  we  can  regard  members  of  Rp  as  K- valued 
functions  is  “near”  the  point  c,  with  “near”  depending  on  the  element  of  Rp.  See  the  discussion 
after  the  proof  of  Corollary  8.50  below. 
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The  contraction  RDM  consists  of  all  elements  in  R  of  the  form  s~x p  with  s 
in  S  and  p  in  P.  Let  us  see  that  the  contraction  equals  P.  Certainly  RD  M  2  P  ■ 
For  the  reverse  inclusion  the  equation  s~l  p  =  r  says  that  p  =  rs.  If  r  is  not  in 
P,  then  the  facts  that  s  is  not  in  P  and  P  is  prime  imply  that  p  =  rs  is  not  in  P, 
contradiction.  Thus  r  is  in  P,  and  we  conclude  that  P  can  be  recovered  from  M 
as  P  =  R  (~)  M . 

If  Q  is  any  prime  ideal  of  R  that  is  not  contained  in  P,  then  S~ 1  Q  =  S~  1  R.  In 
fact,  any  element  q  of  Q  that  is  not  in  P  is  in  S:  therefore  1  is  in  the  ideal  S~ 1  Q, 
and  S~lQ  =  S~lR.  □ 

The  construction  of  Rp  in  the  corollary  reduces  to  the  construction  of  the 
field  of  fractions  of  R  if  P  =0.  Other  interesting  and  typical  cases  occur  for 
suitable  nonzero  P’s  when  R  =  E[ X.  K ],  K  being  a  held.  One  such  prime  ideal 
is  P  =  (X  —  c,  Y  —  d);  then,  as  was  mentioned  in  connection  with  Example  3 
above,  the  localization  of  R  at  P  consists  of  the  rational  expressions  /(X,  Y) 
that  are  well  defined  at  (c,  d).  The  maximal  ideal  in  this  case  consists  of  all  such 
rational  expressions  that  are  0  at  (c,  d).  Another  example  of  a  nonzero  prime 
ideal  in  R  =  K[  X,  Y]  is  P  =  (X);  then  the  localization  of  R  at  P  consists  of 
the  rational  expressions  /(X ,  Y)  whose  denominators  are  not  divisible  by  X, 
and  the  maximal  ideal  consists  of  all  such  rational  expressions  /(X,  Y)  whose 
numerators  are  divisible  by  X  if  /  is  written  in  lowest  terms. 

A  number-theoretic  analog  of  the  localizations  of  the  previous  paragraph  is  the 
localization  of  R  =  7L  at  (/;),  where  p  is  a  prime  number.  The  discussion  with 
Example  3  above  mentioned  that  the  localization  consists  of  all  members  of  Q 
with  no  factor  of  p  in  the  denominator.  In  this  case  the  maximal  ideal  consists 
of  those  rationals  q  whose  numerators  are  divisible  by  p  if  q  is  written  in  lowest 
terms. 


We  conclude  this  section  with  introductory  remarks  about  a  product  operation 
on  ideals.  Let  R  be  a  nonzero  commutative  ring  with  identity.  If  7  and  J  are  ideals 
in  R ,  then  once  again  7 J  denotes16  the  set  of  all  sums  of  products  of  a  member  of 
7  by  a  member  of  J .  Certainly  7  J  is  closed  under  addition  and  negatives,  and  the 
fact  that  r(I  J)  =  ( rI)J  c  I J  forr  e  R  shows  that  I J  is  an  ideal.  Localization 
with  respect  to  a  prime  ideal  is  a  handy  tool  for  extracting  information  about 
products  of  ideals.  We  illustrate  with  Propositions  8.52  and  8.53  below.  The  first 
of  these  will  play  an  important  role  in  Section  1 1 . 


16Sometimes,  such  as  in  the  equality  S-1S-1  =  S-1.  the  product  notation  is  meant  to  refer  only 
to  the  set  of  all  products,  not  to  all  sums  of  products.  With  ideals  we  are  to  allow  sums  of  products. 
The  applicable  convention  will  normally  be  clear  from  the  context,  but  we  shall  be  explicit  when 
there  might  be  a  possibility  of  confusion. 
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Lemma  8.51  (Nakayama's  Lemma).  Let  R  be  a  commutative  ring  with 
identity,  let  7  be  an  ideal  of  R  contained  in  all  maximal  ideals,  and  let  M  be 
a  finitely  generated  unital  R  module.  If  IM  =  M ,  then  M  =  0. 

Remark.  Here  I M  means  the  set  of  sums  of  products  of  a  member  of  /  by  a 
member  of  M.  The  lemma  applies  to  no  ideals  if  R  =  0. 

PROOF.  We  induct  on  the  number  of  generators  of  M.  If  M  is  singly  generated, 
say  by  a  generator  m ,  then  the  hypothesis  I M  =  M  implies  that  mi  =  m  for 
some  r  in  /.  Thus  (1  —  r)m  =0.  If  1  —  r  is  a  unit,  then  we  can  multiply  by  its 
inverse  and  obtain  m  =  0;  we  conclude  that  M  =  0.  If  1  —  r  is  not  a  unit,  then 
it  lies  in  some  maximal  ideal  P,  by  application  of  Proposition  8.8  to  the  proper 
principal  ideal  (1  —  r).  Since  r  lies  in  P  by  hypothesis,  1  lies  in  P,  and  we  have 
a  contradiction  to  the  fact  that  P  is  proper. 

Suppose  that  the  lemma  holds  for  n  —  1  or  fewer  generators,  and  let  M  be 
generated  by  m i,...,m„.  Since  1M  =  M,  we  have  Y^j=\rjmj  =  mi  f°r 
suitable  r\, ...  ,rn  in  7.  Then  (1  —  r\)m\  =  Yll=2  rjmj-  If  1  —  r\  is  a  unit,  then 
we  can  multiply  by  its  inverse  and  see  that  the  generator  m  \  is  unnecessary;  we 
conclude  that  M  =  0  by  induction.  If  1  —  r\  is  not  a  unit,  then  it  lies  in  some 
maximal  ideal  P.  Since  r\  lies  in  P  by  hypothesis,  1  lies  in  P,  and  we  have  a 
contradiction.  □ 

Proposition  8.52.  Let  R  be  a  Noetherian  commutative  ring,  and  let  7  and  P 
be  ideals  in  R  with  P  prime.  If  IP  =  1 ,  then  7=0. 

PROOF.  Let  us  localize  with  respect  to  the  prime  ideal  P.  If  we  write  S  for  the 
set-theoretic  complement  of  P  in  R,  then  RP  =  S~ 1  R  is  a  local  ring  by  Corollary 
8.50,  and  its  unique  maximal  ideal  is  S-1  P.  Since  (S~x  I)(S~X  R)  =  S~l I R  = 
S_17,  S~'I  is  an  ideal  in  RP.  Also,  (S“1/)(S“1/>)  =  S~' I P  =  S-1/,  and 
S~x  I  has  to  be  proper.  In  Nakayama’s  Lemma  (Lemma  8.51),  let  us  take  M  to 
be  the  S~x  R  module  S~x  I .  Since  S~ 1 P  is  the  only  maximal  ideal  in  S~]  R.  M  is 
contained  in  all  maximal  ideals  of  S~l  R.  Since  R  is  Noetherian,  Corollary  8.48a 
shows  S~l R  to  be  Noetherian,  and  the  ideal  S~x I  is  a  finitely  generated  S~x R 
module  by  Corollary  8.31c.  The  lemma  applies  since  (S~l  P)(S~l I)  =  S~x  I , 
and  the  conclusion  is  that  S~x  I  =0.  Then  the  subset  /  of  S~x  I  must  be  0.  □ 

Proposition  8.53.  Let  R  be  an  integral  domain,  F  be  its  field  of  fractions, 
K  be  any  field  containing  F.  and  T  be  the  integral  closure  of  R  in  K.  If  P  is  a 
maximal  ideal  in  R,  then  F  T  F=-  T.  and  there  exists  a  maximal  ideal  Q  of  7  with 
P  =  R  n  Q. 

Remarks.  This  result  inverts  the  construction  of  Proposition  8.43,  of  course 
not  necessarily  uniquely.  The  examples  in  Section  7  illustrate  what  can  happen 
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in  simple  cases.  More  detailed  analysis  of  what  can  happen  in  general  requires 
some  field  theory  and  is  postponed  to  Chapter  IX,  specifically  when  we  discuss 
“splitting  of  prime  ideals  in  extensions.” 

PROOF.  If  PT  /  T .  then  Proposition  8.8  supplies  a  maximal  ideal  Q  of  T 
withPT  C  Q.  Since  1  is  not  in  Q,  we  then  have  P  C  R  DQ  C  R  Consequently 
the  maximality  of  P  implies  that  P  =  R  D  Q. 

Arguing  by  contradiction,  we  now  assume  that  PT  =  T.  Localizing,  let  S 
be  the  set- theoretic  complement  of  P  in  R,  so  that  S~l  P  is  the  unique  maximal 
ideal  of  S_1  R  by  Corollary  8.50.  From  PT  =  T ,  we  can  write 

1  =  a\ti  +  •  •  •  +  a„t„  (*) 

with  each  a,  in  P  and  each  t,  in  T .  If  we  define  Tq  to  be  the  subring  /?[q ,  •  •  •  ,  t„  \ 
of  T,  then  Tq  is  a  finitely  generated  R  module  by  Proposition  8.37,  and  S~1Tq 
is  therefore  a  finitely  generated  S~ 1  R  module.  Equation  (*)  shows  that  1  lies 
in  PI}).  Multiplying  by  an  arbitrary  element  of  To,  we  see  that  PI})  =  Tq. 
Since  S-1S_1  =  S-1 ,  we  obtain  (S~ 1  P)(S~ 1  T<>)  =  S-1  To.  Nakayama's  Lemma 
(Lemma  8.51)  allows  us  to  conclude  that  S~1Tq  =  0.  Since  1  lies  in  Tq,  we  have 
arrived  at  a  contradiction.  □ 


11.  Dedekind  Domains 

A  Dedekind  domain  is  an  integral  domain  with  the  following  three  properties: 

(i)  it  is  Noetherian, 

(ii)  it  is  integrally  closed, 

(iii)  every  nonzero  prime  ideal  is  maximal. 

Every  principal  ideal  domain  R  is  a  Dedekind  domain.  In  fact,  (i)  every  ideal 
in  R  is  singly  generated,  (ii)  R  is  integrally  closed  by  Proposition  8.41,  and  (iii) 
every  nonzero  prime  ideal  in  R  is  maximal  by  Corollary  8.16. 

We  shall  be  interested  in  Dedekind  domains  that  are  obtained  by  enlarging  a 
principal  ideal  domain  suitably.  The  general  theorem  in  this  direction  is  that  if 
R  is  a  Dedekind  domain  with  field  of  fractions  F  and  if  K  is  a  field  containing 
F  with  dim /-  K  finite,  then  the  integral  closure  of  R  in  K  is  a  Dedekind  domain. 
Let  us  state  something  less  sweeping. 

Theorem  8.54.  If  R  is  a  Dedekind  domain  with  field  of  fractions  F  and  if  K 
is  a  field  containing  F  with  dim/.  K  finite,  then  the  integral  closure  T  of  R  in  K 
is  a  Dedekind  domain  if  any  of  the  following  three  conditions  holds: 

(a)  T  is  Noetherian, 

(b)  T  is  finitely  generated  as  an  R  module, 

(c)  the  field  extension  F  c  K  is  “separable.” 
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Remarks.  The  term  “separable"  will  be  defined  in  Chapter  IX,  and  the  fact  that 
(c)  implies  (b)  will  be  proved  at  that  time.  It  will  be  proved  also  that  characteristic  0 
implies  separable.  For  now,  we  shall  be  content  with  showing  that  (b)  implies  (a) 
and  that  (a)  implies  that  T  is  a  Dedekind  domain. 

PROOF.  We  are  given  that  R  satisfies  conditions  (i),  (ii),  (iii)  above,  and  we  are 
to  verify  the  conditions  for  T.  Condition  (ii)  holds  for  T  by  Corollary  8.40,  and 
Proposition  8.45  shows  that  (iii)  holds.  If  (a)  holds,  then  T  satisfies  the  defining 
conditions  of  a  Dedekind  domain. 

Let  us  see  that  (b)  implies  (a).  If  (b)  holds,  then  Proposition  8.34  shows  that 
every  R  submodule  of  T  is  finitely  generated.  Since  T  2  R,  every  T  submodule 
of  T  is  finitely  generated.  That  is,  every  ideal  of  T  is  finitely  generated,  and  T  is 
Noetherian.  Thus  (a)  holds,  and  the  proof  is  complete.  □ 

Example  2  of  integral  closures  in  Section  9  showed  that  the  integral  closure  of 
Z  in  is  doubly  generated  as  a  Z  module,  a  set  of  generators  being  either 

{1,  s/m}  or  {1,  ^(1  +  s/m  )},  depending  on  the  value  of  in.  Example  3  showed, 
under  the  assumption  that  K  has  characteristic  different  from  2,  that  the  integral 
closure  of  K[jc]  in  K(x)[  J  P (.r )  ]  is  doubly  generated  as  a  K[x]  module,  a  set  of 
generators  being  {1,  ^P(x) }.  Since  Z  and  K[jc]  are  principal  ideal  domains  and 
hence  Dedekind  domains,  these  examples  give  concrete  cases  in  which  hypothesis 
(b)  in  Theorem  8.54  is  satisfied.  Consequently  in  each  case  the  theorem  asserts 
that  a  certain  explicit  integral  closure  is  a  Dedekind  domain. 

Theorem  8.55  (unique  factorization  of  ideals).  If  R  is  a  Dedekind  domain, 
then  each  nonzero  proper  ideal  7  in  R  decomposes  as  a  finite  product  ]~["=1  P1/ , 
where  the  Pt  ’s  are  distinct  nonzero  prime  ideals  and  the  kj  ’s  are  positive  integers. 
Moreover, 

(a)  the  decomposition  into  positive  powers  of  distinct  nonzero  prime  ideals 
is  unique  up  to  the  order  of  the  factors, 

(b)  the  power  Pk  of  a  nonzero  prime  ideal  P  appearing  in  the  decomposition 
of  7  is  characterized  as  the  unique  nonnegative  integer  such  that  Pk 
contains  7  and  Pk+ 1  does  not  contain  7  (with  k  =  0  interpreted  as  saying 
that  P  is  not  one  of  the  Pj), 

(c)  whenever  7,  J\ ,  Ji  are  nonzero  ideals  with  I  J\  =  1  h,  then  J  \  =  Jj, 

(d)  whenever  7  and  J\  are  two  nonzero  proper  ideals  with  /  c  J|,  then  there 
exists  a  nonzero  ideal  Js  with  7  =  J\Ji. 

Let  us  say  that  a  nonzero  ideal  J\  divides  a  nonzero  ideal  7  if  7  =  J\  Ji  for 
some  ideal  Js.  We  say  also  that  J\  is  a  factor  of  7.  Conclusion  (d),  once  it 
is  established,  is  an  important  principle  for  working  with  ideals  in  a  Dedekind 
domain:  to  contain  is  to  divide. 
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Thinking  along  these  lines  leads  us  to  expect  that  prime  ideals  play  some 
special  role  with  respect  to  containment.  Such  a  role  is  captured  by  the  following 
lemma. 

Lemma  8.56.  In  an  integral  domain,  if  P  is  a  prime  ideal  such  that 
P  D  7t  •  •  •  /„  for  the  product  of  the  ideals  then  P  3  /,  for  some  j. 

PROOF.  By  induction  it  is  enough  to  handle  n  =  2.  Thus  suppose  P  3  l\l2. 
We  are  to  show  that  P  I\  or  P  12.  Arguing  by  contradiction,  suppose 
on  the  contrary  that  x  e  / 1  and  y  £  /2  are  elements  with  x  ^  P  and  y  £  P. 
Then  xy  cannot  be  in  P  since  P  is  prime,  but  xy  is  in  / 1  Is  C  P,  and  we  have  a 
contradiction.  □ 

Lemma  8.57.  Let  R  be  a  Dedekind  domain,  and  let  /  be  a  nonzero  ideal  of 
R.  Then  there  exists  a  finite  product  P\  ■  ■  ■  P\.  of  nonzero  prime  ideals,  possibly 
empty  and  not  necessarily  having  distinct  factors,  such  that  Pi  •  •  •  Pa  C  / . 

PROOF.  We  argue  by  contradiction.  Among  all  nonzero  ideals  for  which  there 
is  no  such  finite  product,  choose  one,  say  7 ,  that  is  maximal  under  inclusion. 
This  choice  is  possible  since  R  is  Noetherian.  The  ideal  7  cannot  be  prime  since 
otherwise  7  C  J  would  be  the  containment  asserted  by  the  lemma.  Thus  we  can 
choose  elements  ci\  and  a2  in  R  with  aia2  e  7,  rq  $.  7,  and  a2  ^  7.  Define 
ideals  I\  and  I2  by  I\  =  7  +  Ra\  and  I2  =  J  +  Ra2.  These  strictly  contain 
J,  and  their  product  manifestly  has  1\I2  C  J .  By  maximality  of  /,  we  can  find 
products  P\  ■  ■  ■  P*  and  <2i  •  •  •  Qi  of  nonzero  prime  ideals  with  P\  ■  ■  ■  Pk  Q  h  and 
Q\  •  •  •  Qi  Q  h-  Then  Pi  •  •  •  P*<2i  •  •  •  Qi  Q  hh  Q  7,  contradiction.  □ 

Lemma  8.58.  Let  R  be  a  Dedekind  domain,  regard  R  as  embedded  in  its  held 
of  fractions  P,  let  P  be  a  nonzero  prime  ideal  in  P,  and  define 

p->  =  {x  €  F  \  xP  C  R}. 

Then  the  set  PP-1  of  sums  of  products  equals  R. 

PROOF.  By  definition  of  P-1,  P  C  PP-1  C  R.  Since  P  is  an  ideal  and 
PP-1  is  closed  under  addition  and  negatives,  PP-1  is  an  ideal.  Property  (iii)  of 
Dedekind  domains  shows  that  Pisa  maximal  ideal  in  R ,  and  therefore  P  P  “ 1  =  P 
or  PP~'  =  R.  We  are  to  rule  out  the  first  alternative. 

Thus  suppose  that  PP-1  =  P.  Since  R  is  Noetherian  by  (i),  P  is  a  finitely 
generated  R  submodule  of  F.  The  equality  PP-1  =  P  implies  that  each  member 
x  of  P-1  has  xP  C  P,  and  Proposition  8.35c  implies  that  each  such  x  is  integral 
over  R.  Since  R  is  integrally  closed  by  (ii),  x  is  in  R.  Thus  P-1  C  P,  and  the 
definition  of  P-1  shows  that  P-1  =  R. 
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Fix  a  nonzero  element  a  of  P .  Applying  Lemma  8.57,  find  a  product  of 
nonzero  prime  ideals  such  that  P\  ■  ■  •  /\  C  (a)  C  P .  Without  loss  of  generality, 
we  may  assume  that  k  is  as  small  as  possible  among  all  such  inclusions.  Since 
P  is  prime  and  P\  ■  ■  ■  Pk  C  P,  Lemma  8.56  shows  that  P  contains  some  Pj,  say 
P\.  By  (iii),  P\  is  maximal,  and  therefore  P  =  P\.  Form  the  product  IP  -  ■  Pk, 
taking  this  product  to  be  R  if  k  =  1.  Then  /L  •  •  •  Pk  is  not  a  subset  of  (a),  by 
minimality  of  k.  and  there  exists  a  member  b  of  Pi  -  -  ■  Pk  that  is  not  in  (a).  On 
the  other  hand,  P Pj  -  ■  Pk  Q  ( a )  shows  that  Pb  C  (a),  hence  that  a~lbP  C  R. 
Thus  a~xb  is  in  P~x,  which  we  are  assuming  is  R.  In  other  words,  a~xb  is  in  R , 
and  b  is  in  a  R  =  (a),  contradiction.  □ 

Proof  of  Theorem  8.55.  Arguing  by  contradiction,  we  may  assume  because 
R  is  Noetherian  that  I  is  maximal  among  the  nonzero  proper  ideals  that  do  not 
decompose  as  products  of  prime  ideals.  Then  certainly  I  is  not  prime.  Application 
of  Proposition  8.8  produces  a  maximal  ideal  P  containing  /,  and  P  is  prime 
by  Corollary  8.11.  Multiplying  /  C  P  by  P~x  as  in  Lemma  8.58,  we  obtain 
I  c  P~l  /  C  P~l  P  =  R ,  the  equality  holding  by  Lemma  8.58.  Flence  P~x  I 
is  an  ideal.  An  equality  I  =  P~x I  would  imply  that  PI  =  P P~x I  =  1  by 
Lemma  8.58,  and  then  Proposition  8.52  would  yield  /  =  0,  a  contradiction 
to  the  hypothesis  that  I  is  nonzero.  An  equality  P~x  I  =  R  would  imply 
I  =  P  P~x  I  =  PR  =  P  by  Lemma  8.58,  in  contradiction  to  the  fact  that 
1  is  not  prime.  We  conclude  that  I  C  p~lj  C  r  The  maximal  choice 
of  I  shows  that  P~x  1  decomposes  as  a  product  P~x I  =  P\  ■  ■  ■  Pr  of  prime 
ideals,  not  necessarily  distinct.  One  more  application  of  Lemma  8.58  yields 
/  =  P P~x  I  =  PP\  ■■■  Pr,  and  we  have  a  contradiction.  We  conclude  that  every 
nonzero  proper  ideal  decomposes  as  a  product  of  prime  ideals.  Grouping  equal 
factors,  we  can  write  the  decomposition  as  in  the  statement  of  the  theorem. 

Next  let  us  establish  uniqueness  as  in  (a).  Suppose  that  we  have  two  equal 
decompositions  Pi  ■  ■  ■  P,  =  Q\  -  ■  ■  Qs  as  the  product  of  prime  ideals,  and  suppose 
that  r  <  s.  We  show  by  induction  on  r  that  r  =  s  and  that  the  factors  on  the 
two  sides  match,  apart  from  their  order.  The  base  case  of  the  induction  is  r  =0, 
and  then  it  is  evident  that  s  =  0.  Assume  the  uniqueness  for  r  —  1.  Since  P\  is 
prime  and  P\  3  Q  \  ■  ■  ■  Qs,  Pi  2  Qj  for  some  j  by  Lemma  8.56.  By  (iii)  for 
Dedekind  domains,  Qj  is  a  maximal  ideal,  and  therefore  P\  =  Qj.  Multiplying 
the  equality  P\-  ■  ■  Pr  =  Q  \  ■  ■  ■  Qs  by  l\  1  and  applying  Lemma  8.58  to  each 
side,  we  obtain  Ps  -  ■  ■  Pr  =  Q\  •  •  •  Qj- 1  Qj+i  •  •  •  Qs ■  The  inductive  hypothesis 
implies  that  r  —  1  =  s  —  1  and  the  factors  on  the  two  sides  match,  apart  from 
their  order.  Then  we  can  conclude  about  the  equality  P\  -  ■  ■  Pr  =  Q\  ■  ■  ■  Qs  that 
r  =  s  and  that  the  factors  on  the  two  sides  match,  apart  from  their  order.  This 
proves  (a). 

Let  us  establish  the  formula  in  (b)  for  kj.  Suppose  that  P  is  a  prime  ideal. 
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By  (a),  we  can  write  7  =  P”  J  for  a  certain  integer  n  >  0  in  such  a  way  that  P 
does  not  appear  in  the  unique  decomposition  of  J .  Certainly  Pk  2  /  for  k  <  n 
because  Pk  2  pk  pn~k  =  p»  z>  pn  j  =  /  Suppose  Pn+l  2  7.  Multiplying 
pn+\  3  j  _  pn  j  p,y  n  factors  of  P~x  and  using  Lemma  8.58  repeatedly,  we 
obtain  P  2  P~"  I  =  J  ■  Since  P  is  prime.  Lemma  8.56  shows  that  P  must 
contain  one  of  the  factors  when  J  is  decomposed  as  the  product  of  prime  ideals, 
and  we  have  a  contradiction  to  the  maximality  of  this  factor  unless  this  factor  is 
P  itself.  In  this  case,  P  appears  in  the  decomposition  of  J,  and  again  we  have  a 
contradiction. 

For  (c),  if  I  J]  =  I J2,  substitute  the  unique  decompositions  as  products  of 
prime  ideals  for  I,  J\,  and  Js,  and  use  (a)  to  cancel  the  factors  from  7  on  each 
side,  obtaining  J\  =  Js. 

For  (d),  suppose  that  I  and  J \  are  two  nonzero  proper  ideals  with  7  C  J\ .  If 
Pk'  is  the  largest  power  of  a  prime  ideal  P,  appearing  in  the  decomposition  of  J\ , 
then  Pk'  2  J\  2  P  and  (b)  shows  that  Pki  appears  in  the  decomposition  of  /.  In 
other  words,  if  /,  is  the  largest  power  of  P;  appearing  in  the  decomposition  of  7, 
then  l[  >  kj .  Let  Js  =  J- [,-  Pl‘~k‘ .  Then  we  obtain  7  =  J\P-  and  (d)  is  proved.  □ 

Corollary  8.59.  Let  7?  be  a  Dedekind  domain,  and  let  P  be  a  nonzero  prime 
ideal  in  R.  Then  there  exists  an  element  n  in  P  such  that  it  is  not  in  P2,  and  any 
such  element  has  the  property  that  nk  is  not  in  Pk+i  for  any  k  >  1. 

PROOF.  Proposition  8.52  shows  that  P2  is  a  proper  subset  of  P,  and  therefore 
we  can  find  an  element  it  in  P  that  is  not  in  P2.  Since  the  principal  ideal  (7 r)  has 
(tv )  2  P  and  (71)  ^  P1.  the  factorization  of  (7T  )  involves  P  but  not  P2.  Thus  we 
can  use  Theorem  8.55  to  write  (tt)  =  P  Qi  ■  ■  ■  Qn  for  prime  ideals  Q\ , . . . ,  Qn 
different  from  P.  Then  (tt*)  =  (n)k  =  Pk  Q\  ■  ■  ■  Qk,  and  (b)  of  the  theorem 
says  that  Pk+ 1  does  not  contain  (rck  ).  □ 

Corollary  8.60.  Let  7?  be  a  Dedekind  domain,  and  let  P  be  a  nonzero  prime 
ideal  in  R.  For  any  integer  e  >  1,  the  natural  action  of  R  on  powers  of  P 
makes  Pe~l / Pe  into  a  vector  space  over  the  field  R/P,  and  this  vector  space  is 
1  -dimensional. 

Remarks.  This  technical-sounding  corollary  will  be  used  crucially  late  in 
Chapter  IX  of  this  volume  and  again  in  Chapter  V  of  Advanced  Algebra. 

Proof.  Since  R(Pe~l )  c  Pe~l  and  P{Pe~l)  c  Pe ,  we  obtain 
(R/P)(Pe~l/Pe)  C  Pe~l/Pe. 

Thus  Pe~x / Pe  is  a  unital  R/P  module,  i.e.,  a  vector  space  over  the  field  R/P. 
We  show  that  it  has  dimension  1 .  Corollary  8.59  shows  that  there  exists  a  member 
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jt  of  P  not  in  P2,  and  it  shows  that  nk  is  not  in  Pk+l  for  any  k.  This  element 
ix  has  the  property  that  (jt)  =  PQi  ■  ■  ■  Qr  for  nonzero  prime  ideals  Q\, ,  Q, 
distinct  from  P.  and  thus 

Rne~l  =  =  Pe~l  Q\~l  ■  ■  ■  Qe~l . 

Hence 

Rne~l  +  Pe  =  Pe~l{Q\~X  ■  ■  ■  Qe~x  +  P). 

The  ideal  in  parentheses  on  the  right  side  strictly  contains  P  since  the  failure 
of  P  to  divide  Q\~l  ■  ■  ■  Qe~x  means  that  P  does  not  contain  Qe^~l  ■  ■  ■  Qe~x  (by 
Theorem  8.55d).  Since  P  is  maximal,  the  ideal  in  parentheses  is  R,  and  we  see 
that R(ne~l  +  Pe)  =  Pe~l / Pe .  Therefore {R/P)(TTe~x  +Pe)  =  Pe~l/Pe.  This 
formula  says  that  Pe~x / Pe  consists  of  all  scalar  multiples  of  a  certain  element, 
and  it  follows  that  pe~l / pe  is  1 -dimensional.  □ 

Lemma  8.61.  If  P  and  Q  are  distinct  maximal  ideals  in  an  integral  domain  R 
and  if  k  and  l  are  positive  integers,  then  Pk  +  Q1  =  R. 

PROOF.  We  know  that  Pk  +  Q1  is  an  ideal.  Arguing  by  contradiction,  assume 
that  it  is  proper.  Then  we  can  find  a  maximal  ideal  M  with  M  ^  Pk  +  Q1 .  This 
M  satisfies  M  2  Pk  and  M  2>  Q1 .  By  Lemma  8.56,  M  2>  P  and  M  5  Q-  Since 
P  and  Q  are  distinct  and  maximal,  we  obtain  P  =  M  =  Q.  contradiction.  □ 

Corollary  8.62.  If  R  is  a  Dedekind  domain  with  only  finitely  many  prime 
ideals,  then  R  is  a  principal  ideal  domain. 

Remarks.  Corollary  8.48  may  be  used  to  produce  examples  to  which  Corol¬ 
lary  8.62  is  applicable.  All  we  have  to  do  is  to  take  one  of  our  standard  Dedekind 
domains  R  and  localize  with  respect  to  a  nonzero  prime  ideal  P.  The  corollary 
says  that  the  result  RP  is  a  Dedekind  domain,  and  it  has  a  unique  maximal  ideal, 
hence  a  unique  nonzero  prime  ideal.  The  conclusion  is  that  Rp  is  a  principal 
ideal  domain. 

PROOF.  Let  P\, . . . ,  Pn  be  the  distinct  nonzero  prime  ideals.  Theorem  8.55 
shows  that  any  nonzero  ideal  I  in  R  factors  uniquely  as  I  =  Pk]  ■  ■  ■  Pkn  with 
each  kj  >  0.  For  1  <  i  <  n,  Corollary  8.59  produces  7r,  in  /(  such  that  i r,-  is  not 
in  Pf,  and  it  shows  that  jr'"  is  not  in  P('”+l . 

k .  fa- 

Lemma  8.61  gives  Pt '  +  P- !  =  R  if  /  ^  j.  Applying  the  Chinese  Remainder 
Theorem  (Theorem8. 27a),  wecan  find  an  element  a  in  P  with  a  =  nk‘  mod  Pki+l 
for  1  <  i  <  n.  Using  Theorem  8.55  again,  let  (a)  =  P\'  •  •  •  P]"  be  the  unique 
factorization  of  the  principal  ideal  ( a ).  The  defining  property  of  a  shows  that  a 
is  in  P-!  but  not  Pki+l  for  each  i.  Thus  (a)  is  contained  in  Pk'  but  not  in  Pk'+X . 
By  Theorem  8.55b,  /,  =  k,  for  each  i.  Hence  the  ideal  /  =  Pk'  ■  ■  ■  Pkn  =  (a)  is 
exhibited  as  principal.  □ 
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Corollary  8.63.  If  R  is  a  Dedekind  domain  and  if  I  =  n  /=i  is  the  unique 

factorization  of  a  nonzero  proper  ideal  7  as  the  product  of  positive  powers  of 

distinct  prime  ideals  Pj,  then  the  map  r  ]~[J=i  PjJ  defined  on  K  by  r  m>- 

Ic • 

(. . . ,  r  +  Pj 1 , . . . )  descends  to  a  ring  isomorphism 
R/I  =  1 x  •  •  •  x  R/P 

PROOF.  Lemma  8.61  shows  that  P-'  +  P^J  =  R  if  i  ^  j.  Then  the  result 
follows  immediately  from  the  Chinese  Remainder  Theorem  (Theorem  8.27).  □ 

12.  Problems 

1.  This  problem  examines  ring  homomorphisms  of  the  held  of  real  numbers  into 
itself  that  carry  1  into  1 .  Let  <p  be  such  a  homomorphism. 

(a)  Prove  that  <p  is  the  identity  on  Q. 

(b)  Prove  that  <p  maps  squares  into  squares. 

(c)  Prove  that  < p  respects  the  ordering  of  R,  i.e.,  that  a  <  b  implies  cp(a )  <  (pib). 

(d)  Prove  that  <p  is  the  identity  on  R. 

2.  An  element  r  in  a  commutative  ring  with  identity  is  called  nilpotent  if  r"  =  0 
for  some  integer  n.  Prove  that  if  r  is  nilpotent,  then  1  +  r  is  a  unit. 

3.  If  R  is  a  held,  prove  that  the  embedding  of  R  in  its  held  of  fractions  exhibits  R 
as  isomorphic  to  its  held  of  fractions. 

4.  Prove  that  X  is  prime  in  R [A]  if  R  is  an  integral  domain. 

5.  Suppose  that  R  is  an  integral  domain  that  is  not  a  held. 

(a)  Prove  that  there  is  a  nonzero  prime  ideal  in  R[X  \  that  is  not  maximal. 

(b)  Prove  that  there  is  an  ideal  in  R[X]  that  is  not  principal. 

6.  This  problem  makes  use  of  real-analysis  facts  concerning  closed  bounded  inter¬ 
vals  of  the  real  line.  Let  R  be  the  ring  of  all  continuous  functions  from  [0,  1]  into 
R,  with  pointwise  multiplication  as  the  ring  multiplication. 

(a)  Prove  for  each  jco  in  [0,  1  ]  that  the  set  h0  of  members  of  R  that  vanish  at  xo 
is  a  maximal  ideal  of  R. 

(b)  Prove  that  any  maximal  ideal  I  of  R  that  is  not  some  h0  contains  hnitely 
many  members  f\  ,...,/„  of  R  that  have  no  common  zero  on  [0,  1], 

(c)  By  considering  /p  +  •  — f  /n2  in  (b),  prove  that  every  maximal  ideal  of  R  is 
of  the  form  ho  for  some  xo  in  [0,  1]. 

7.  Let  R  be  the  ring  of  all  bounded  continuous  functions  from  R  into  R,  with 
pointwise  multiplication  as  the  ring  multiplication.  Say  that  a  member  f  of  R 
vanishes  at  infinity  if  for  each  e  >  0,  there  is  some  N  such  that  \f(x)\  <  e 
whenever  \x\  >  N.  Answer  the  following: 


444 


VIII.  Commutative  Rings  and  Their  Modules 


(a)  Show  that  the  subset  I0 0  of  all  members  of  R  that  vanish  at  infinity  is  an 
ideal  but  not  a  maximal  ideal. 

(b)  Why  must  R  have  at  least  one  maximal  ideal  I  that  contains  I0 c? 

(c)  Why  can  there  be  no  xq  in  R.  such  that  the  maximal  ideal  I  of  (b)  consists 
of  all  members  of  R  that  vanish  at  xq! 

8.  Let  I  be  a  nonzero  ideal  in  Z[V— 5  ]. 

(a)  Prove  that  I  contains  some  positive  integer. 

(b)  Prove  that  I,  as  an  abelian  group  under  addition,  is  free  abelian  of  rank  2. 

(c)  If  n  denotes  the  least  positive  integer  in  I,  prove  that  I  has  a  Z  basis  of  the 
form  {n,  a  +  by/— 5  }  for  a  suitable  member  a  +  by/— 5  of  rL[yf— 5  ]. 

9.  Let  i fi  :  R  R'  be  a  homomorphism  of  commutative  rings  with  identity  such 
that  q>(  I )  =  1.  Prove  that  if  P'  is  a  prime  ideal  in  R\  then  P  =  (p~l(Pr)  is  a 
prime  ideal  in  R. 

10.  Determine  the  maximal  ideals  of  each  of  the  following  rings: 

(a)  IxR, 

(b)  M[X]/(X2), 

(c)  M[X]/(X2  -  3X  +  2), 

(d)  R[X]/(X2  +  X+  1). 

11.  (a)  Prove  or  disprove:  If  I  is  a  nonzero  prime  ideal  in  Q[X],  then  Q[X]//  is  a 

unique  factorization  domain. 

(b)  Prove  or  disprove:  If  I  is  a  nonzero  prime  ideal  in  Z[X],  then  Z [X\/ 1  is  a 
unique  factorization  domain. 

12.  (Partial  fractions)  Let  R  be  a  principal  ideal  domain,  and  let  F  be  its  field  of 
fractions. 

(a)  Let  n  be  a  nonzero  member  of  R  with  a  factorization  n  =  cd  such  that 
GCD(c,  d)  —  1.  Prove  for  each  m  in  R  that  the  member  mn~l  of  F  has  a 
decomposition  as  mn~l  —  ac~ 1  +  bd~x  with  a  and  bin  R. 

(b)  Let  n  be  a  nonzero  member  of  R  with  a  factorization  n  —  pk/  ■  ■  ■  pkr ,  the 
elements  pj  being  nonassociate  primes  in  R.  Prove  for  each  m  in  R  that  the 
member  mn~l  of  F  has  a  decomposition  as  mn~l  =  q\p/k'  +■  ■  ■+qrp/kr 
with  all  q,  in  R. 

13.  (a)  By  adapting  the  proof  that  the  ring  of  Gaussian  integers  forms  a  Euclidean 

domain,  prove  that  the  function  S  (a  +  by/— 2)  —  a2  +  2 b2  satisfies  S ( rtJ )  = 
S (r  )<5 (r' )  and  exhibits  7L[y[— 2]  as  a  Euclidean  domain. 

(b)  It  was  shown  in  Section  9  that  Z[V~ 3  ]  is  not  a  unique  factorization  domain, 
hence  cannot  be  a  Euclidean  domain.  What  goes  wrong  with  continuing  the 
adaptation  in  the  previous  problem  so  that  it  applies  to  7L[yf— 3  ]? 
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14.  Let  G  be  a  group,  and  let  R  be  a  commutative  ring  with  identity.  Examples  16 
and  17  in  Section  1  defined  the  group  algebra  RG  and  the  R  algebra  C(G,  R) 
of  functions  from  G  into  R,  convolution  being  the  multiplication  in  C(G,  R). 
Prove  that  the  mapping  g  i->  fg  described  with  Example  17  extends  to  an  R 
algebra  isomorphism  of  RG  onto  C'(  R .  G). 

15.  Let  I  be  an  ideal  in  Z[X],  and  suppose  that  the  lowest  degree  of  a  nonzero 
polynomial  in  I  is  n  and  that  I  contains  some  monic  polynomial  of  degree  n. 
Prove  that  I  is  a  principal  ideal. 

16.  For  each  integer  n  >  0,  exhibit  an  ideal  /„  in  Z[X]  that  cannot  be  written  with 
fewer  than  n  generators. 

17.  Let  (p  be  the  substitution  homomorphism  <p  :  K[x,y]  — »■  K[f]  defined  by  x  i— >•  t2, 
y  i-y  f3,  and  <p(c )  =  c  for  c  e  K. 

(a)  Prove  that  ker <p  is  the  principal  ideal  (v2  —  x3). 

(b)  What  is  image  cpl 

18.  Let  R  —  Z[i], 

(a)  Show  that  each  unital  R  module  M  may  be  regarded  as  an  abelian  group 
with  an  abelian-group  homomorphism  tp  :  M  —*■  M  for  which  q>2  is  the 
mappings  i-*  —m. 

(b)  Show  conversely  that  if  M  is  an  abelian  group  and  there  exists  an  abelian- 
group  homomorphism  <p  :  M  —*■  M  for  which  cp2  is  the  mapping  m  i->  —m, 
then  M  may  be  regarded  as  a  unital  R  module. 

19.  Let  R  be  a  unique  factorization  domain,  and  let  F  be  its  field  of  fractions.  Let 
A(X)  and  B(X)  be  nonzero  polynomials  in  h'\X\,  let  Ait(X)  and  Bq(X)  be  their 
associated  primitive  polynomials,  and  suppose  that  B(X)  divides  A(X)  in  F[X J. 
Prove  that  Bq(X)  divides  Aq(X)  in  R[X\. 

20.  Prove  that  an  integral  domain  with  finitely  many  elements  is  a  field. 

21.  Two  proofs  of  Theorem  8.18  were  given,  one  using  direct  multiplication  of 
polynomials  and  the  other  using  polynomials  with  coefficients  taken  modulo 
( p ),  and  it  was  stated  that  proofs  in  both  these  styles  could  be  given  for  Corollary 
8.22.  A  proof  in  the  first  style  was  supplied  in  the  text.  Supply  a  proof  in  the 
second  style. 

22.  Let  K  be  a  field. 

(a)  Prove  that  det  ^  ^  ^  ) ,  when  considered  as  a  polynomial  in  K[ W,  X,  Y,  Z], 
is  irreducible. 

(b)  Let  Xjj  be  indeterminates  for  i  and  j  from  1  to  n .  Doing  an  induction,  prove 
that  the  polynomial  det[X,-y]  is  irreducible  in  K[Xn,  X\2, ....  Xnn], 

23.  Prove  that  two  members  of  Z[X\  are  relatively  prime  in  Q[A]  if  and  only  if  the 
ideal  they  generate  in  Z[X]  contains  a  nonzero  integer. 


446 


VIII.  Commutative  Rings  and  Their  Modules 


24.  Let  V  be  the  Z[i]  module  with  two  generators  u\ ,  112  related  by  the  conditions 
(1  +  i)u\  +  (2  —  i)ii2  —  0  and  3u\  +  5ii<2  =  0.  Express  V  as  the  direct  sum  of 
cyclic  Z[i]  modules. 

Problems  25-26  concern  the  ring  R  —  Z[^(l  +  V~ m  )],  where  m  is  a  square-free 
integer  >  1  with  m  =  3  mod  4.  Let  F  —  Q[*J—m  ]  be  the  field  of  fractions  of  R. 

25.  For  z  =  x  +  y^J—m  in  F,  define  S(z)  —  x2  +  my2. 

(a)  Show  that  S(zw)  —  S(z)S(w). 

(b)  Show  that  if  for  each  z  in  F  there  is  some  r  in  R  with  <5(z  —  r)  <  1,  then  S 
exhibits  R  as  a  Euclidean  domain. 

26.  Prove  that  the  condition  of  part  (b)  of  the  previous  problem  is  satisfied  for  m  —  3, 
7,  and  11,  and  conclude  that  Z[^(l  +  —m  )]  is  a  Euclidean  domain  for  these 
values  of  m. 

Problems  27-31  classify  the  primes  in  the  ring  Z[i]  of  Gaussian  integers.  This  ring 
is  a  Euclidean  domain  and  therefore  is  a  unique  factorization  domain.  Members  of 
this  ring  will  be  written  as  a  +  bi ,  and  it  is  understood  that  a  and  b  are  in  Z.  Put 
N(a  +  bi)  —  (a  +  bi)(a  —  bi)  —  a2  +  b2. 

27.  Let  a  +  bi  be  prime  in  Z [/].  Prove  that 

(a)  a  —  bi  is  prime. 

(b)  N(a  +  bi)  is  a  power  of  some  positive  prime  p  in  Z. 

(c)  N(a  +  bi)  equals  p  or  p2  when  p  is  as  in  (b). 

(d)  N(a  +  bi)  —  p2  in  (c)  forces  a  +  bi  —  p,  apart  from  a  unit  factor. 

28.  Prove  that  no  prime  a  +  bi  in  Z[i]  has  N(a  +  bi)  —  p  with  p  of  the  form  An  +  3. 
Conclude  that  every  positive  prime  in  Z  of  the  form  An  +  3  is  a  prime  in  Z [*]. 

29.  Prove  that  the  only  primes  a  +  bi  of  Z[i]  for  which  N (a  +  bi)  equals  2  or  22  are 
1  +  i  and  its  associates,  for  which  N (a  +  bi)  —  2. 

30.  Prove  that  if  p  is  a  positive  prime  in  Z  of  the  form  An  +  1 ,  then  —  1  is  a  square 
in  the  finite  field  Fp . 

3 1 .  Let  p  be  a  positive  prime  in  Z  of  the  form  An  +  1 . 

(a)  Prove  that  there  exist  ring  homomorphisms  <p\  of  Z[X ]  onto  Fp  [X ]/{X2  +  1 ) 
and  q>2  of  Z[X]  onto  Z [i]/(p). 

(b)  Prove  that  ker  <p\  and  ker  tp2  are  both  equal  to  the  ideal  (p,  X2  +  1)  in  Z[X\ , 
and  deduce  a  ring  isomorphism  Z[i\/{p)  =  ¥p[X]/(X2  +  1). 

(c)  Taking  into  account  the  results  of  Problems  27  and  30,  show  that  p  is  not 
prime  in  Z[i]  and  is  therefore  of  the  form  p  —  N(a  +  bi)  —  a2  +  b2  for 
some  prime  a  +  bi  in  Z[i], 

(d)  Prove  a  uniqueness  result  for  the  decomposition  p  —  a2  +  b2,  that  if  also 
p  —  a'2  +  b'2.  then  a'  +  b'i  is  an  associate  either  of  a  +  bi  or  of  a  —  bi. 
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Problems  32-35  establish  a  theory  of  elementary  divisors.  This  theory  provides 
a  different  uniqueness  result,  beyond  the  one  in  Corollary  8.28,  to  accompany  the 
Fundamental  Theorem  of  Finitely  Generated  Modules  over  a  Principal  ideal  Domain. 
When  specialized  to  K[X]  for  a  held  K,  the  theory  yields  the  rational  canonical  form 
of  a  member  of  M„(K).  Let  R  be  a  nonzero  principal  ideal  domain.  If  C  and  D  are 
members  of  Mmn(R),  let  us  say  that  C  and  D  are  equivalent  if  there  exist  A  in 
Mm(R )  and  B  in  M„(R )  with  det  A  in  Rx ,  detB  in  Rx ,  and  D  =  ACB.  Fix  m 
and  n,  and  put  k  —  min (7??.  n ) .  If  C  is  a  member  of  Mmn (R),  its  diagonal  entries 
are  the  entries  Cn,  C22. . . . ,  Ckk-  The  matrix  C  will  be  called  diagonal  if  its  only 
nonzero  entries  are  diagonal  entries.  Problems  26-31  of  Chapter  V  are  relevant  for 
Problem  34. 

32.  (a)  Suppose  that  C  is  a  diagonal  matrix  in  Mmn{R)  with  Cn  /  0.  Show  that 

C  is  equivalent  to  a  matrix  C'  described  as  follows:  all  entries  of  C  are  the 
same  as  those  of  C  except  possibly  for  the  entries  C'2l, . . . ,  CL  in  the  first 
column,  and  these  satisfy  CL  =  Cjj . 

(b)  By  applying  the  algorithm  of  Lemma  8.26  to  the  matrix  C'  in  (a),  prove  that 
any  nonzero  diagonal  matrix  C  in  Mmn(R )  is  equivalent  to  a  diagonal  matrix 
C"  such  that  C" l  divides  all  the  diagonal  entries  of  C" . 

(c)  By  iterating  the  construction  in  (a)  and  (b),  prove  that  any  diagonal  matrix 
C  in  Mmn{R)  is  equivalent  to  a  diagonal  matrix  D  having  the  following 
properties:  The  nonzero  diagonal  entries  of  D  are  the  entries  Djj  with 
1  <  j  <  l  for  some  integer  /  with  0  <  l  <  k.  For  each  j  with  1  <  j  <  l, 
Djj  divides  Dj+\  j+\. 

33.  (a)  Establish  the  following  uniqueness  theorem:  Let  D  and  E  be  diagonal 

matrices  in  Mmn(R )  whose  diagonal  entries  satisfy  the  divisibility  property 
in  (c)  of  the  previous  problem.  Prove  that  if  D  and  E  are  equivalent,  then  they 
have  the  same  number  of  nonzero  entries,  and  their  corresponding  diagonal 
entries  are  associates. 

(b)  Combine  Corollary  8.29,  Problem  32,  and  Problem  33a  to  establish  the 
following  elementary-divisors  version  of  the  Fundamental  Theorem  of 
Finitely  Generated  Modules:  If  R  is  a  principal  ideal  domain,  then  any 
finitely  generated  unital  R  module  M  is  the  direct  sum  of  a  nonunique  free  R 
submodule  ®  J=1  R  of  a  well-defined  finite  rank  ,v  >  0  and  the  R  submodule 
T  of  all  members  m  of  M  such  that  rm  =  0  for  some  r  ^  0  in  R.  In  turn, 
the  R  submodule  T  is  isomorphic  to  a  direct  sum  T  =  0 1=1  R/{dj ),  where 
the  dj  are  nonzero  nonunits  in  R  such  that  dj  divides  dj+\  for  1  <  j  <  I. 
The  number  of  /  of  summands  and  the  ideals  (dj)  are  uniquely  determined 
by  M. 

34.  (a)  (Rational  decomposition)  Let  K  be  a  field,  and  let  L  :  V  — »■  V  be  a  K 

linear  mapping  from  a  finite-dimensional  K  vector  space  V  to  itself.  By 
applying  Theorem  8.25  and  the  results  of  the  previous  problems  to  V  as  a 
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K[X]  module  with  Xv  —  L(v),  prove  the  following:  V  can  be  written  as 
the  direct  sum  of  cyclic  subspaces  Vi, ...  ,Vr  under  L  in  such  a  way  that 
the  minimal  polynomial  of  L  on  Vj  divides  the  minimal  polynomial  of  L  on 
Vj+ 1  for  1  <  j  <  r;  moreover,  the  integer  r  and  the  minimal  polynomials 
are  uniquely  determined  by  L,  and  any  two  linear  mappings  with  the  same 
r  and  matching  minimal  polynomials  are  similar  over  K. 

(b)  (Rational  canonical  form)  Interpret  the  result  of  (a)  as  saying  something 
about  similarity  over  K  of  any  matrix  in  Mnn  (K)  to  a  certain  block  diagonal 
matrix  with  blocks  of  the  form  in  Problem  28  for  Chapter  V  and  with  minimal 
polynomials  having  a  suitable  divisibility  property. 

35.  Let  K  and  L  be  fields  with  K  C  L,  and  suppose  that  two  members  of  Mn  (K)  are 
conjugate  via  GL(n,  L).  Prove  that  they  are  conjugate  via  GL(n,  IK). 

Problems  36-39  concern  symmetric  polynomials  in  n  indeterminates  over  a  field.  Let 
F  be  a  field,  and  let  R  =  F\X i, . . . ,  Xn\.  If  er  e  S„  is  a  permutation,  then  there 
is  a  corresponding  substitution  homomorphism  of  rings  er*  :  R  — »■  R  fixing  F  and 
carrying  each  Xj  into  Xa{jy  A  symmetric  polynomial  A  in  I?  is  a  member  of  R 
for  which  a*  A  =  A  for  every  permutation  er.  The  symmetric  polynomials  form  a 
subring  of  R  containing  the  constants.  The  main  result  about  symmetric  polynomials 
is  that  every  symmetric  polynomial  is  a  polynomial  in  the  “elementary  symmetric 
polynomials”;  these  will  be  defined  below. 

36.  Prove  that  the  ring  homomorphisms  er*  satisfy  (err)*  =  er*t*.  Deduce  that  each 
er*  :  R  — »•  R  is  an  isomorphism. 

37.  Prove  that  the  homogeneous-polynomial  expansion  of  any  symmetric  polynomial 
is  into  symmetric  polynomials. 

38.  For  each  permutation  er,  let  er**  be  the  substitution  homomorphism  of  A1  [A]  = 
F[X i, . . . ,  Xn,  X]  acting  as  er*  on  R  and  carrying  X  to  itself. 

(a)  Prove  that  (err)**  =  er**r**  and  that  each  er**  is  a  ring  isomorphism  of 

R[X]. 

(b)  Prove  that  each  coefficient  in  R[X]  of  any  polynomial  fixed  by  all  er**  is  a 
symmetric  polynomial  in  R. 

(c)  The  polynomial  (X  —  X\)(X  —  X2)  ■  ■  ■  (X  —  X„ )  is  fixed  by  all  er**,  and  its 
coefficients  are  called  the  elementary  symmetric  polynomials.  Show  that 
they  are 

£t=£*i,  E2=Y.XiXj,  £3=  E  XiXjXj,...,  Fn  =  X  \X2  -  ■  ■  xn. 

i  i  <  j  i<  j  <k 

39.  Order  the  monomials  of  total  degree  m  by  saying  that  the  monomial  aX1^'  •  •  •  Xkn" 
with  fl/O  and  E  kj  =  m  is  greater  than  the  monomial  a'X  ■  ■  ■  X1,"  with  a'  ^  0 
and  E  lj  =  m  if  the  first  j  for  which  kj  ^  //  has  kj  >  lj . 
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(a)  If  A(Xi ,  . . . ,  Xn)  is  a  nonzero  symmetric  polynomial  homogeneous  of  de¬ 
gree  m  and  if  aX\'  ■  •  ■  X„"  is  its  nonzero  monomial  that  is  highest  in  the 
above  order,  why  must  it  be  true  that  k\  >  ki  >  •  •  •  >  k„l 

(b)  Verify  that  the  largest  monomial  in  E\'  ■  ■  ■  E„'  in  the  ordering  is 


j£-Cl+C2H - b Cn  +  ‘ 


(c)  Show  that  if  A(X i,  . . . ,  X„)  is  a  nonzero  symmetric  polynomial  homoge¬ 
neous  of  degree  m ,  then  there  exist  a  symmetric  polynomial  M  —  E^'  ■  ■  ■  Ec," 
homogeneous  of  degree  m  and  a  scalar  r  such  that  the  largest  monomials  in 
A  and  rM  are  equal. 

(d)  With  notation  as  in  (c),  show  that  A  —  rM  equals  0  or  else  the  largest 
monomial  of  A  is  greater  than  the  largest  monomial  of  A  —  rm. 

(e)  Deduce  that  every  symmetric  polynomial  is  a  polynomial  in  the  elementary 
symmetric  polynomials. 


Problems  40-43  concern  the  Pfaffian  of  a  (2/i)-by-(2n)  alternating  matrix  X  —  [xij ] 
with  entries  in  a  field  K.  Here  “alternating”  means  that  Xjj  =  — Xjj  for  all  i  and  j 
and  x a  —  0  for  all  i.  The  Pfaffian  is  the  polynomial  in  the  entries  of  X  with  integer 
coefficients  given  by 


n 

Pfaff(Z)  =  E  (Sgnt)  ]~ [  Xz(2k—\ ),r(2£) ? 

certain  r’s  k=  1 

in  &2„ 


where  the  sum  is  taken  over  those  permutations  r  such  that  r  (2k  —  1 )  <  r  (2k)  for 
1  <  k  <  77  and  such  that  r(l)  <  r(3)  <  •••  <  r  (2/7  —  1 ) .  The  Pfaffian  was  introduced 
in  Problems  23-28  at  the  end  of  Chapter  VI.  It  was  shown  in  those  problems  that 
the  Pfaffian  satisfies  det  X  —  (Pfaff(X))2.  The  present  problems  will  make  use  of 
that  result  but  of  no  other  results  from  Chapter  VI.  They  will  also  make  use  of  facts 
concerning  continuous  functions  and  connected  open  subsets  of  Euclidean  space. 

40.  Prove  by  induction  on  m  that  the  open  subset  of  C"!  on  which  a  nonzero  poly¬ 
nomial  function  P(z\ , . . . ,  z.m)  is  nonzero  is  path  wise  connected  and  therefore 
connected. 

41.  For  this  problem  let  K  =  C. 

(a)  For  any  two  matrices  A  and  X  in  M2;!(C)  with  X  alternating,  prove  that 
Pfaf'fY A1  X A)  =  ±(det  A)Pfaff(X)  with  the  sign  depending  on  A  and  X. 

(b)  Fix  X,  and  allow  A  to  vary.  Using  Problem  40,  prove  that  the  sign  is  always 
positive  in  (a).  That  is,  prove  that  Pfaff(  A’XA)  —  (det  A )  Pfafff  V ) . 
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42.  For  this  problem  let  K  be  any  field.  By  regarding  the  expressions  Pfaffi  A'  X  A  ) 
and  (det  A )  Pfaffi X )  as  polynomials  with  coefficients  in  Z  in  the  indeterminates 
Ajj  for  all  i  and  j  and  the  indeterminates  Xjj  for  i  <  j ,  and  using  the  prin¬ 
ciple  of  permanence  of  identities  in  Section  V.2,  prove  that  Pfaffi  A'  X  A  )  = 
(det  A ) Pfaffi X )  whenever  A  and  X  are  in  /W?,,  (X)  and  X  is  alternating. 

43.  Section  VI. 5  defines  a  particular  alternating  matrix  J  for  which  Pfaffi  J )  —  1. 
A  symplectic  matrix  g  over  K  is  one  for  which  g'  J  g  =  J .  Prove  that  every 
symplectic  matrix  has  determinant  1 . 

Problems  44-47  concern  Dedekind  domains.  Let  R  be  such  a  domain.  It  is  to  be 
proved  that  each  nonzero  ideal  I  is  doubly  generated  in  the  sense  that  I  —  Ra  +  Rb 
for  suitable  members  a  and  b  of  R. 

44.  Let  R\, Rn  be  nonzero  commutative  rings  with  identity,  not  necessarily 
integral  domains.  Prove  that  if  every  ideal  of  each  Rj  is  principal,  then  every 
ideal  in  R\  x  •  •  •  x  Rn  is  principal. 

45.  Let  P  be  a  nonzero  prime  ideal,  and  let  k  be  a  positive  integer. 

(a)  Prove  that  the  only  nonzero  proper  ideals  in  R/Pk  are  P / Pk,  P2/ Pk, . . .  , 
Pk~l/Pk. 

(b)  Using  the  element  it  in  the  statement  of  Corollary  8.59,  prove  that  each  of 
the  ideals  in  (a)  is  principal. 

46.  Combining  Corollary  8.63  with  Problems  44  and  45,  conclude  that  the  quotient 
of  R  by  any  nonzero  proper  ideal  has  only  principal  ideals. 

47.  Let  /  be  a  nonzero  proper  ideal  in  R.  By  letting  a  be  any  nonzero  element  of  I 
and  by  applying  (c)  in  the  previous  problem  to  the  ideal  I  /(a)  of  R/(a),  prove 
that  I  =  Ra  +  Rb  for  a  suitable  b  in  I. 

Problems  48-53  introduce  and  classify  “fractional  ideals”  in  Dedekind  domains.  Let 
R  be  a  Dedekind  domain,  regarded  as  a  subring  of  its  field  of  fractions  F .  A  fractional 
ideal  in  F  is  a  finitely  generated  R  submodule  of  F . 

48.  Prove  that  the  fractional  ideals  in  F  that  lie  in  R  are  exactly  the  ordinary  ideals 
of  R. 

49.  Prove  for  any  fractional  ideal  M  that  there  exists  a  nonzero  member  a  of  F  such 
that  aM  lies  in  R  and  hence  is  an  ordinary  ideal.  Conclude  that  the  product  of 
two  fractional  ideals  is  a  fractional  ideal. 

50.  Prove  that  if  I  is  a  nonzero  ideal  of  R  and  if  /“'  is  defined  by 

r1  =  {x  e  F  |  xR  c  /}, 

then  7-1  is  a  fractional  ideal  in  F .  Conclude  that  if  P  is  a  prime  ideal  in  R,  then 
P-1  as  defined  in  Lemma  8.58  is  a  fractional  ideal  in  F . 
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51.  Prove,  by  arguing  with  an  ideal  that  is  maximal  among  those  for  which  the 
statement  is  false,  that  to  any  nonzero  ideal  I  in  R  corresponds  some  fractional 
ideal  M  of  F  such  that  I M  =  R. 

52.  Prove  in  the  notation  of  the  previous  two  problems  that  M  —  I~l . 

53.  Deduce  that  every  nonzero  fractional  ideal  is  of  the  form  /  J~l,  where  I  and  J 
are  nonzero  ideals.  Conclude  that 

(a)  the  nonzero  fractional  ideals  are  exactly  all  products  ]~[”=|  pf' ,  where  the  P, 
are  distinct  nonzero  prime  ideals  and  the  k,  are  arbitrary  nonzero  integers, 
positive  or  negative, 

(b)  the  nonzero  fractional  ideals  form  a  group. 


CHAPTER  IX 


Fields  and  Galois  Theory 


Abstract.  This  chapter  develops  some  general  theory  for  field  extensions  and  then  goes  on  to 
study  Galois  groups  and  their  uses.  More  than  half  the  chapter  illustrates  by  example  the  power 
and  usefulness  of  the  theory  of  Galois  groups.  Prerequisite  material  from  Chapter  VIII  consists 
of  Sections  1-6  for  Sections  1-13  of  the  present  chapter,  and  it  consists  of  all  of  Chapter  VIII  for 
Sections  14-17  of  the  present  chapter. 

Sections  1-2  introduce  field  extensions.  These  are  inclusions  of  a  base  field  in  a  larger  field. 
The  fundamental  construction  is  of  a  simple  extension,  algebraic  or  transcendental,  and  the  next 
construction  is  of  a  splitting  field.  An  algebraic  simple  extension  is  made  by  adjoining  a  root  of  an 
irreducible  polynomial  over  the  base  field,  and  a  splitting  field  is  made  by  adjoining  all  the  roots  of 
such  a  polynomial.  For  both  constructions,  there  are  existence  and  uniqueness  theorems. 

Section  3  classifies  finite  fields.  For  each  integer  q  that  is  a  power  of  some  prime  number,  there 
exists  one  and  only  one  finite  field  of  order  q ,  up  to  isomorphism.  One  finite  field  is  an  extension  of 
another,  apart  from  isomorphisms,  if  and  only  if  the  order  of  the  first  field  is  a  power  of  the  order  of 
the  second  field. 

Section  4  concerns  algebraic  closure.  Any  field  has  an  algebraic  extension  in  which  each 
nonconstant  polynomial  over  the  extension  field  has  a  root.  Such  a  field  exists  and  is  unique  up 
to  isomorphism. 

Section  5  applies  the  theory  of  Sections  1-2  to  the  problem  of  constructibility  with  straightedge 
and  compass.  First  the  problem  is  translated  into  the  language  of  field  theory.  Then  it  is  shown  that 
three  desired  constructions  from  antiquity  are  impossible:  "doubling  a  cube,”  trisecting  an  arbitrary 
constructible  angle,  and  "squaring  a  circle.”  The  full  proof  of  the  impossibility  of  squaring  a  circle 
uses  the  fact  that  n  is  transcendental  over  the  rationals,  and  the  proof  of  this  property  of  n  is  deferred 
to  Section  14.  Section  5  concludes  with  a  statement  of  the  theorem  of  Gauss  identifying  integers  n 
such  that  a  regular  ;?-gon  is  constructible  and  with  some  preliminary  steps  toward  its  proof. 

Sections  6-8  introduce  Galois  groups  and  develop  their  theory.  The  theory  applies  to  a  field 
extension  with  three  properties— that  it  is  finite-dimensional,  separable,  and  normal.  Such  an 
extension  is  called  a  "finite  Galois  extension.”  The  Fundamental  Theorem  of  Galois  Theory  says  in 
this  case  that  the  intermediate  extensions  are  in  one-one  correspondence  with  subgroups  of  the  Galois 
group,  and  it  gives  formulas  relating  the  corresponding  intermediate  fields  and  Galois  subgroups. 

Sections  9-1 1  give  three  standard  initial  applications  of  Galois  groups.  The  first  is  to  proving  the 
theorem  of  Gauss  about  constructibility  of  regular  f?-gons,  the  second  is  to  deriving  the  Fundamental 
Theorem  of  Algebra  from  the  Intermediate  Value  Theorem,  and  the  third  is  to  proving  the  necessity 
of  the  condition  of  Abel  and  Galois  for  solvability  of  polynomial  equations  by  radicals— that  the 
Galois  group  of  the  splitting  field  of  the  polynomial  have  a  composition  series  with  abelian  quotients. 

Sections  12-13  begin  to  derive  quantitative  information,  rather  than  qualitative  information,  from 
Galois  groups.  Section  12  shows  how  an  appropriate  Galois  group  points  to  the  specific  steps  in 
the  construction  of  a  regular  n-gon  when  the  construction  is  possible.  Section  13  introduces  a  tool 
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known  as  Lagrange  resolvents,  a  precursor  of  modern  harmonic  analysis.  Lagrange  resolvents  are 
used  first  to  show  that  Galois  extensions  in  characteristic  0  with  cyclic  Galois  group  of  prime  order  p 
are  simple  extensions  obtained  by  adjoining  a  pth  root,  provided  all  the  pth  roots  of  1  lie  in  the  base 
field.  Lagrange  resolvents  and  this  theorem  about  cyclic  Galois  groups  combine  to  yield  a  derivation 
of  Cardan's  formula  for  solving  general  cubic  equations. 

Section  14  begins  the  part  of  the  chapter  that  depends  on  results  in  the  later  sections  of  Chap¬ 
ter  VIII.  Section  14  itself  contains  a  proof  that  ji  is  transcendental;  the  proof  is  a  nice  illustration  of 
the  interplay  of  algebra  and  elementary  real  analysis. 

Section  1 5  introduces  the  field  polynomial  of  an  element  in  a  finite-dimensional  extension  field. 
The  determinant  and  trace  of  this  polynomial  are  called  the  norm  and  trace  of  the  element.  The 
section  gives  various  formulas  for  the  norm  and  trace,  including  formulas  involving  Galois  groups. 
With  these  formulas  in  hand,  the  section  concludes  by  completing  the  proof  of  Theorem  8.54  about 
extending  Dedekind  domains,  part  of  the  proof  having  been  deferred  from  Section  VIII.  1 1 . 

Section  1 6  discusses  how  prime  ideals  split  when  one  passes,  for  example,  from  the  integers  to 
the  algebraic  integers  in  a  number  field.  The  topic  here  was  broached  in  the  motivating  examples 
for  algebraic  number  theory  and  algebraic  geometry  as  introduced  in  Section  VIII.7,  and  it  was  the 
main  topic  of  concern  in  that  section.  The  present  results  put  matters  into  a  wider  context. 

Section  17  gives  two  tools  that  sometimes  help  in  identifying  Galois  groups,  particularly  of 
splitting  fields  of  monic  polynomials  with  integer  coefficients.  One  tool  uses  the  discriminant  of  the 
polynomial.  The  other  uses  reduction  of  the  coefficients  modulo  various  primes. 


1.  Algebraic  Elements 

If  K  and  k  are  fields  such  that  k  is  a  subfield  of  K,  we  say  that  K  is  a  field 
extension  of  k.  When  it  is  necessary  to  refer  to  this  situation  in  some  piece  of 
notation,  we  often  write  K/k  to  indicate  the  held  extension.  In  this  section  we 
shall  study  held  extensions  in  a  general  way,  and  in  the  next  section  we  shall 
discuss  constructions  and  uniqueness  results  involving  them. 

If  K  and  K'  are  two  helds  and  if  <p  is  a  ring  homomorphism  of  K  into  K7  with 
<p(  1)  =  1,  then  tp  is  automatically  one-one  since  K  has  no  nontrivial  ideals.  We 
refer  to  ^  as  a  field  map  or  field  mapping. 1  If  IK  and  IK'  are  both  field  extensions 
of  a  field  k  and  if  the  restriction  of  a  field  map  <p  to  k  is  the  identity,  then  < p  is 
called  a  k  field  map  or  a  field  map  fixing  k.  The  terminology  “k  field  map”  is 
consistent  with  the  view  that  IK  and  IK'  are  two  R  algebras  for  R  =  k  in  the  sense 
of  Examples  6  and  15  in  Section  VIII.  1,  and  that  the  isomorphism  in  question  is 
just  an  R  algebra  isomorphism. 

If  a  field  map  :  IK  — IK'  is  onto  IK',  then  <p  is  a  field  isomorphism;  it  is  a 
k  field  isomorphism  if  IK  and  IK'  are  extensions  of  k  and  <p  is  the  identity  on  k. 
When  IK  =  IK'  and  tp  is  onto  IK',  q?  is  called  an  automorphism  of  IK;  if  also  tp  is 
the  identity  on  a  subfield  k,  then  <p  is  called  a  k  automorphism  of  IK. 


'This  is  the  notion  of  morphism  in  the  category  of  fields. 
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Throughout  this  section  we  let  K/k  be  a  field  extension.  If  x\ , . . . ,  xn  are 
members  of  K,  we  let 

k[jri , . . . ,  x„]  =  subring  of  K  generated  by  1  and  xi, ...  ,xn, 
k(xj , ...  ,xn)  =  subheld  of  K  generated  by  1  and  x\, ...  ,xn. 

The  latter,  in  more  detail,  means  the  set  of  all  quotients  ab~l  with  a  and  b  in 
k[xi, . . . ,  x„]  and  with  h  0.  It  is  referred  to  as  the  field  obtained  by  adjoining 
Xi, . . . ,  x„  to  k.  Because  of  this  description  of  the  elements  of  k(xi , . . . ,  x„),  the 
field  k(xi ,  . . . ,  x„)  can  be  regarded  as  the  field  of  fractions  F  of  k[xi , . . . ,  xn\.  In 
fact,  we  argue  as  follows:  let  :  k[xi, . . . ,  x„]  — >■  F  be  the  natural  ring  homo¬ 
morphism  a  i->  class  of  (a ,  1 )  of  k[xi , . . . ,  x„  ]  into  its  field  of  fractions;  then  the 
universal  mapping  property  of  F  stated  in  Proposition  8.6  gives  a  factorization  of 
the  inclusion  l  :  k[xj , . . . ,  x„]  — »■  k(xi , . . . ,  xn)  as  i  =Tr],  and  the  field  mapping 
Thas  to  be  onto  k(xi , . . . ,  xn)  since  the  class  of  (a,  b)  maps  to  the  member  ab~l 
of  k(xi, . . . ,  x„). 

As  in  Chapter  IV  and  elsewhere,  we  let  k[X]  be  the  ring  of  polynomials  in 
the  indeterminate  X  with  coefficients  in  k.  For  each  x  in  K,  we  have  a  unique 
substitution  homomorphism  <px  :  k[  X  |  — >  k[x]  carrying  k  to  itself  and  carrying 
X  to  x.  We  say  that  x  is  algebraic  over  k  if  <px  is  not  one-one,  i.e.,  if  x  is  a  root 
of  some  nonzero  polynomial  in  k[V],  and  that  x  is  transcendental  over  k  if  <px 
is  one-one. 

Examples. 

(1)  If  k  =  M,  if  K  =  C,  and  if  x  is  the  usual  element  i  =  s/—l,  then 
<Pi  ( X 2  +  1)  =  0,  and  i  is  algebraic  over  R. 

(2)  If  k  =  Q,  if  K  =  C,  and  if  0  is  a  complex  number  with  the  property  that 
0n  +  c„-\9n~x  +  •  •  •  +  c\9  +  co  =  0  for  some  n  and  for  some  coefficients  in  Q, 
then  9  is  algebraic  over  Q.  This  situation  was  the  subject  of  Proposition  4. 1 ,  of 
Example  2  in  Section  IV.4,  and  of  Example  10  in  Section  VIII.  1. 

(3)  Let  k  =  Q  and  K  =  C.  For  n  equal  to  the  usual  trigonometric  constant, 
given  as  the  least  positive  real  such  that  ein  =  —  1  when  ez  =  z" /n  •’  d  will 
be  proved  in  Section  14  that  there  is  no  polynomial  F(X)  in  Q[  X  \  with  F(n)  =  0, 
and  7T  is  consequently  transcendental  over  Q. 

(4)  If  k  =  Z/2Z  and  K  is  the  4-element  field  constructed  in  Example  3  of 
fields  in  Section  IV.4,  then  any  element  of  K  is  algebraic  over  k. 

(5)  If  k  =  C(X)  and  if  K  =  C(X)[v/(X  -  1)X(X  +  1)  ]  as  with  the  ring  R' 
in  Section  VIII. 7  and  as  in  Example  3  of  integral  closures  in  Section  VIII. 9,  then 
s/(X  —  1)X(X  +  1)  is  algebraic  over  C(X). 
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Suppose  that  x  in  IK  is  algebraic  over  k.  Then 

ker <px  =  {F(X)  €  k[X]  |  F(x)  =  0} 

is  an  ideal  in  k[X]  that  is  necessarily  nonzero  and  principal.  A  generator  is 
determined  up  to  a  constant  factor  as  any  nonzero  polynomial  in  the  ideal  that  has 
lowest  possible  degree,  and  we  might  as  well  take  this  polynomial  to  be  monic. 
Thus  ker  <px  is  of  the  form  (Ftj(X))  for  some  unique  monic  polynomial  Fq(X),  and 
this  polynomial  Fq(X )  is  called  the  minimal  polynomial  of  x  over  k.  Review  of 
the  example  at  the  end  of  Section  VIII. 3  may  help  motivate  the  first  five  results 
below. 

Proposition  9.1  If  x  e  K  is  algebraic  over  k,  then  the  minimal  polynomial  of 
x  over  k  is  prime  as  a  polynomial  in  K[X], 

PROOF.  Suppose  that  F(X)  factors  nontrivially  as  F(X)  =  G(X)H(X).  Since 
F{x)  =  0,  either  G (x )  =  0  or  H (x )  =  0,  and  then  we  have  a  contradiction  to 
the  fact  that  F  has  minimal  degree  among  all  polynomials  vanishing  at  x.  □ 

Theorem  9.2.  If  x  e  K  is  algebraic  over  k,  then  the  held  k(jt)  coincides  with 
the  ring  k[jc].  Moreover,  if  the  minimal  polynomial  of  x  over  k  has  degree  n, 
then  each  element  of  k(x)  has  a  unique  expansion  as 

+  c„_2x"-2  +  •  •  •  +  ci.x  +  Co  with  all  q  e  k. 

PROOF.  Since  the  substitution  ring  homomorphism  q>x  carries  k[  X  ]  onto  k[x], 
we  have  an  isomorphism  of  rings  k[x]  =  k[X]/ker<p.,;  =  k[X]/(Fo(X)),  where 
F()(X)  is  the  minimal  polynomial  of  x  over  k.  Since  Fq  is  prime,  (Fq{X))  is  a 
nonzero  prime  ideal  and  hence  is  maximal.  Thus  k[x]  is  a  held.  Consequently 
k(x)  =  k[x]. 

Any  element  in  k[x],  hence  in  k(x),  is  a  polynomial  in  x.  Since  Ftj(x)  =  0, 
we  can  solve  Fq(x)  =  0  for  its  leading  term,  say  x",  obtaining  x”  =  G(.x),  where 
G(X)  =  0  or  deg  G ( X )  <  n  —  1.  Thus  the  expansions  in  the  statement  of  the 
theorem  yield  all  the  members  of  k[x].  If  an  element  has  two  such  expansions, 
we  subtract  them  and  obtain  a  nonzero  polynomial  H(X)  of  degree  at  most  n  —  1 
with  H (x)  =  0,  in  contradiction  to  the  minimality  of  the  degree  of  Fo(X).  □ 

Corollary  9.3.  If  x  e  K  is  algebraic  over  k,  then  the  held  k(x),  regarded  as 
a  vector  space  over  k,  is  of  dimension  n,  where  n  is  the  degree  of  the  minimal 
polynomial  of  x  over  k.  The  elements  1,  x,  x2, . . . ,  x"-1  form  a  basis  of  k(x) 
over  k. 

PROOF.  This  is  just  a  restatement  of  the  second  conclusion  of  Theorem  9.2.  □ 
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We  say  that  the  field  extension  K/k  is  an  algebraic  extension  if  every  element 
of  K  is  algebraic  over  k. 

Proposition  9.4.  If  the  vector- space  dimension  of  K  over  k  is  some  finite  n, 
then  K  is  an  algebraic  extension  of  k,  and  each  element  x  of  K  has  some  nonzero 
polynomial  F(X)  in  k[  X  \  of  degree  at  most  n  for  which  F (x )  =  0. 

PROOF.  This  is  immediate  since  the  elements  1,  x,  x2, ...  ,xn  of  K  have  to  be 
linearly  dependent  over  k.  □ 

When  K/k  is  a  field  extension,  we  write  [K  :  k\  for  the  vector-space  dimension 
dim,  K,  and  we  call  this  the  degree  of  K  over  k.  If  [K  :  k]  is  finite,  we  say  that 
K  is  a  finite  extension  of  k,  or  finite  algebraic  extension  of  k,  the  condition 
“algebraic”  being  automatic  by  Proposition  9.4. 

Corollary  9.5.  If  x  is  in  K,  then  x  is  algebraic  over  k  if  and  only  if  k(jt )  is  a 
finite  algebraic  extension  of  k.  In  this  case  the  minimal  polynomial  of  x  over  k 
has  degree  [k(x)  :  k], 

PROOF.  If  x  is  algebraic  over  k,  then  [k(x)  :  k]  is  finite  and  is  the  degree  of  the 
minimal  polynomial  of  x  over  k,  by  Corollary  9.3.  Proposition  9.4  shows  in  this 
case  that  k(jt )  is  a  finite  algebraic  extension.  If  x  is  transcendental  over  k,  then  the 
substitution  homomorphism  <px  is  one-one,  and  dim,  k(x)  >  dim,  k|  X ]  =  +oo. 

□ 

Theorem  9.6.  Let  k,  K,  and  L  be  fields  with  k  c  K  c  L,  and  suppose  that 
[K  :  k]  =  n  and  [L  :  K]  =  m,  finite  or  infinite.  Let  {a>i,  a>2, . . . }  be  a  vector- 
space  basis  of  K  over  k,  and  let  {£i ,  £2,  •  •  • }  be  a  vector-space  basis  of  L/K.  Then 
the  inn  products  oj,  form  a  basis  of  L  over  k. 

Proof  of  spanning.  If  £  is  in  L,  write  £  =  a^j  with  each  a  j  in  K  and 
with  only  finitely  many  aj  ’s  not  0.  Then  expand  each  aj  in  terms  of  the  <y,  ’s,  and 
substitute.  □ 

Proof  of  linear  independence.  Let  J2,  /  cij0)i^i  =  0  with  the  q/s  in  k. 
Since  the  members  i-j  of  L  are  linearly  independent  over  K,  JT  q;cq-  =  0  for 
each  j.  Since  the  members  cq-  of  K  are  linearly  independent  over  k,  q ;  =  0  for 
all  i  and  j.  □ 

Corollary  9.7.  If  k,  K,  and  L  are  fields  with  k  c  K  c  L,  then 


[L  :  k]  =  [L  :  K]  [K  :  k] . 


PROOF.  This  is  immediate  by  counting  basis  elements  in  Theorem  9.6.  □ 
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Theorem  9.8.  If  K/k  is  a  field  extension  and  if  x\ , ...  ,xn  are  members  of  K 
that  are  algebraic  over  k,  then  k(jti , . . . ,  xn)  is  a  finite  algebraic  extension  of  k. 

Remark.  If  a  finite  algebraic  extension  of  k  turns  out  to  be  of  the  form  k(x) 
for  some  x,  we  say  that  the  extension  is  a  simple  algebraic  extension. 

PROOF.  Since  x,-  is  algebraic  over  k,  it  is  algebraic  over  k(xi,  . . .  ,x;_ f).  Hence 
[k(xi , ...  ,Xi)  :  k(xi , . . . ,  x,  _  i )  ]  is  finite.  Applying  Corollary  9.7  repeatedly,  we 
see  that  k(xi , . . . ,  x„)  is  a  finite  extension  of  k.  Proposition  9.4  shows  that  it  is  a 
finite  algebraic  extension.  □ 

Example.  The  sum  s/2+ 1/2  is  algebraic  over  Q,  as  a  consequence  of  Theorem 
9.8.  This  fact  suggests  Corollary  9.9  below. 

Corollary  9.9  If  K/k  is  a  field  extension,  then  the  elements  of  K  that  are 
algebraic  over  k  form  a  field. 

PROOF.  If  x  and  y  in  K  are  algebraic  over  k,  then  k(x,  v)  is  a  finite  algebraic 
extension  of  k,  according  to  Theorem  9.8.  This  extension  contains  x  ±  y  and  xy, 
and  it  contains  x-1  if  x  0.  The  corollary  therefore  follows  from  Proposition 
9.4.  □ 

For  the  special  case  of  Corollary  9.9  in  which  K  =  C  and  k  =  Q,  this  subfield 
of  C  is  called  the  field  of  algebraic  numbers,  and  any  finite  algebraic  extension  of 
Q  within  C  is  called  a  number  field,  or  an  algebraic  number  field.  The  seeming 
discrepancy  between  this  definition  and  the  definition  given  in  remarks  with 
Proposition  4. 1  (that  in  essence  a  “number  field"  is  any  simple  algebraic  extension 
of  <Q>)  will  be  resolved  by  the  Theorem  of  the  Primitive  Element  (Theorem  9.34 
below). 


2.  Construction  of  Field  Extensions 

In  this  section,  k  denotes  any  field.  Our  interest  will  be  in  constructing  extension 
fields  for  k  and  in  addressing  the  question  of  uniqueness  under  additional  hy¬ 
potheses.  We  begin  with  a  kind  of  converse  to  Proposition  9. 1  that  generalizes  the 
method  described  in  Section  A4  of  the  appendix  for  constructing  C  =  F(/—  I  ) 
from  M  and  the  polynomial  X2  +  1  . 

Theorem  9.10  (existence  theorem  for  simple  algebraic  extensions).  If  F(X)  is 
a  monic  prime  polynomial  in  k[X],  then  there  exists  a  simple  algebraic  extension 
K  =  k(x)  of  k  such  that  x  is  a  root  of  F(X).  Moreover,  F(X)  is  the  minimal 
polynomial  of  x  over  k. 
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PROOF.  Define  K  =  k[X]/(.F(X))  as  a  ring.  Since  F(X)  is  prime,  (F(X))  is 
a  nonzero  prime  ideal,  hence  maximal.  Therefore  K  is  a  field,  an  extension  held 
of  k.  Define  x  to  be  the  coset  X  +  ( F(X )).  Then  F(x )  =  F(X)  +  (F(X))  = 
0  +  ( F(X )),  and  x  is  therefore  algebraic  over  k.  It  is  immediate  that  K  =  k[x], 
and  Theorem  9.2  shows  that  K  =  k(x).  If  G(x)  =  0  for  some  G(X)  in  k[X], 
then  G(X)  is  in  (F(X)).  We  conclude  that  F(X)  has  minimal  degree  among  all 
polynomials  with  x  as  a  root,  and  F(X)  is  therefore  the  minimal  polynomial.  □ 

Theorem  9.11  (uniqueness  theorem  for  simple  algebraic  extensions).  If  F{X) 
is  a  monic  prime  polynomial  in  k[X]  and  if  K  =  k(x)  and  K'  =  k(y)  are  two 
simple  algebraic  extensions  such  that  x  and  y  are  roots  of  F(X),  then  there  exists 
a  held  isomorphism  <p  of  K  onto  ¥J  hxing  k  and  carrying  x  to  y. 

Example.  The  monic  polynomial  F(X)  =  X3  —  2  is  prime  in  Q[X],  and 
x  =  s/2  and  y  =  e2jT 1/2  are  roots  of  it  within  C.  The  helds  Q(x)  and  Q(y) 
are  subhelds  of  C  and  are  distinct  because  Q(x)  is  contained  in  R  and  Q(y)  is 
not.  Nevertheless,  these  helds  are  Q  isomorphic,  according  to  the  theorem. 

PROOF.  In  view  of  the  proof  of  Theorem  9.10,  there  is  no  loss  of  generality 
in  assuming  that  K  =  k[X]/(F(X)).  Since  y  is  algebraic  over  k,  we  can 
form  the  substitution  homomorphism  <py  :  k[X]  — >  k(y).  This  is  a  k  alge¬ 
bra  homomorphism.  Its  kernel  is  the  ideal  (F(X))  since  F(X)  is  the  minimal 
polynomial  of  y,  and  < <py  therefore  descends  to  a  one-one  k  algebra  homomorphism 
<p7  :  k(x)  — »■  k(y).  Since  dimk(x)  and  dimk(y)  both  match  the  degree  of  F(X), 
qy.  is  onto  k(y)  and  is  therefore  the  required  k  isomorphism.  □ 

We  say  that  a  nonconstant  polynomial  F(X)  in  k[X]  splits  in  a  given  extension 
held  if  F(X)  factors  completely  into  degree-one  factors  over  that  extension  held. 
A  splitting  field  over  k  for  a  nonconstant  polynomial  F  (X)  in  k[X]  is  an  extension 
held  L  of  k  such  that  F(X)  splits  in  L  and  such  that  L  is  generated  by  k  and  the 
roots  of  F(X)  in  L. 

Examples.  Let  k  =  Q.  Then  Q(s/—  I  )  is  a  splitting  held  for  X2  +  1,  because 
±v/—  1  are  both  in  Q(s/— T )  and  they  generate  Q(s/— T )  over  Q.  But  Q(  s/2)  is 
not  a  splitting  held  for  X3  —2  because  Q(  1/2)  does  not  contain  the  two  nonreal 
roots  of  X3  —  2. 

Theorem  9.12  (existence  of  splitting  held).  If  F(X)  is  a  nonconstant  polyno¬ 
mial  in  k[X],  then  there  exists  a  splitting  held  of  F(X)  over  k. 

PROOF.  We  begin  by  constructing  a  certain  extension  held  K  of  k  in  which 
F(X)  factors  completely  into  degree-one  factors  in  K[X],  We  do  so  by  induction 
on  n  =  deg  F(X).  For  n =  1,  there  is  nothing  to  prove.  For  general  n,  let  G(X) 
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be  a  prime  factor  of  F(X),  and  apply  Theorem  9.10  to  obtain  a  simple  algebraic 
extension  ki  =  k(xj)  over  k  such  that  G(x\ )  =  0.  Then  F(x\)  =  0,  and  the 
Factor  Theorem  (Corollary  1.13)  gives  F{X)  =  {X  —  x\ )H(X)  for  some  H{X) 
in  ki(X)  of  degree  n  —  1.  Since  deg  H(X)  =  n  —  1  <  deg  F(X).  the  inductive 
hypothesis  produces  an  extension  IK  of  ki  such  that  H (X)  is  a  constant  multiple 
of  {X  —  X2 )  •  •  •  (X  —  xn)  with  all  x,-  in  K.  Then  F{X)  factors  into  degree-one 
factors  in  K[Z],  and  the  induction  is  complete. 

Within  the  constructed  held  IK,  let  L  be  the  subheld  L  =  k(xi, . . . ,  x„).  Then 
F(X)  still  factors  completely  into  degree-one  factors  in  L( X),  and  L  is  generated 
by  k  and  the  X; .  Hence  L  is  a  splitting  held.  □ 

Examples  of  splitting  fields. 

(1)  k  =  Qand  F(X)  =  X3  —  2.  TheproofofTheorem9.12takeski  =  Q(\/2) 
and  writes  X3  —2  =  {X  —  //.)(X2  +  1/2  X  +  ( V2.)2).  Then  the  proof  adjoins 
one  root  9  (hence  both  roots)  of  X2  +  1/2.X  +  ( 1/2 )2,  setting  K  =  Q(\/2,  9). 
With  this  choice  of  IK,  the  splitting  held  turns  out  to  be  L  =  K.  In  fact,  to  see  that 
L  is  not  a  proper  subheld  of  IK,  we  observe  that  6  =  [K  :  k]  =  [K  :  L]  [L  :  Q]  by 
Corollary  9.7  and  that  the  proper  containment  L  Q(\/2)  implies  [L  :  Q]  >  3. 
Since  [L  :  Q]  is  a  divisor  of  6  greater  than  3,  [L  :  Q]  =  6.  Thus  [K  :  L]  =  1, 
and  IK  =  L. 

(2)  k  =  Q  and  F{X)  =  X3  —  X  —  Application  of  Corollary  8.20c  to 
the  polynomial  G(X)  =  -3X2F(l/X)  =  Z3  +  3Z2  -  3  shows  that  G(X) 
has  no  degree-one  factor  and  hence  is  irreducible  over  Q.  Then  it  follows  that 
F(X)  is  irreducible  over  Q.  The  proof  of  Theorem  9.12  takes  ki  =  Q(r),  where 
r3  —  r  —  |  =  0.  Then  division  gives 

Z3  -  X  -  l  =  (X  -  r)(Z2  +  rX  +  (r2  -  1)). 

The  discriminant  b2  —  4 ac  of  the  quadratic  factor  is 

the  right-hand  equality  following  from  direct  computation.  This  discriminant  is 
a  square  in  ki  =  Q(r),  and  hence  Z2  +  rZ  +  (r2  —  1)  factors  into  degree-one 
factors  in  Q(r)  without  passing  to  an  extension  held.  Therefore  L  =  Q(r)  with 
[L  :  Q]  =  3. 

Theorem  9,13  (uniqueness  of  splitting  held).  If  F{X)  is  a  nonconstant  poly¬ 
nomial  in  k[Z],  then  any  two  splitting  helds  of  F(X)  over  k  are  k  isomorphic. 
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The  idea  of  the  proof  is  simple  enough,  but  carrying  out  the  idea  runs  into  a 
technical  complication.  The  idea  is  to  proceed  by  induction,  using  the  uniqueness 
result  for  simple  algebraic  extensions  (Theorem  9.11)  repeatedly  until  all  the  roots 
have  been  addressed.  The  difficulty  is  that  after  one  step  the  coefficients  of  the 
two  quotient  polynomials  end  up  in  two  distinct  but  k  isomorphic  fields.  Thus 
at  the  second  step  Theorem  9.11  does  not  apply  directly.  What  is  needed  is  the 
reformulated  version  given  below  as  Theorem  9. 1 1',  which  lends  itself  to  this  kind 
of  induction.  In  addition,  as  soon  as  the  induction  involves  at  least  three  steps,  the 
above  statement  of  Theorem  9.13  does  not  lend  itself  to  a  direct  inductive  proof. 
For  this  reason  we  shall  instead  prove  a  reformulated  version  Theorem  9.13'  of 
Theorem  9.13  that  is  ostensibly  more  general  than  Theorem  9.13. 

Recall  from  Proposition  4.24  that  a  general  substitution  homomorphism  that 
starts  from  a  polynomial  ring  can  have  two  ingredients.  One  is  the  substitution 
of  some  element,  such  as  x,  for  the  indeterminate  X,  and  the  other  is  a  homo¬ 
morphism  that  is  made  to  act  on  the  coefficients.  If  the  homomorphism  is  cr, 
let  us  write  Fa(X)  to  indicate  the  polynomial  obtained  by  applying  a  to  each 
coefficient  of  F(X). 

Theorem  9.11'.  Let  k  and  k'  be  fields,  and  let  a  :  k  — »■  k'  be  a  held 
isomorphism.  Suppose  that  F(X)  is  a  monic  prime  polynomial  in  k[  X  |  and  that 
K  =  k(jc)  and  K'  =  k'(x')  are  simple  algebraic  extensions  such  that  F(x)  =  0 
and  Fa  (xr)  =  0.  Then  there  exists  a  held  isomorphism  q>  :  k(jt)  — »■  k'(x')  such 
that  cp  |  =  a  and  cp{x)  =  x' . 

PROOF.  The  argument  is  essentially  unchanged  from  the  proof  of  Theorem 
9.11.  We  start  from  the  substitution  homomorphism  G(X)  h->  Gr‘  (x')  that 
replaces  X  by  x'  and  that  operates  by  a  on  the  coefficients.  This  descends  to 
a  held  map  of  k[jc]  into  k'[x'],  and  the  homomorphism  must  be  onto  k'[.v']  by  a 
count  of  dimensions.  □ 

Theorem  9.13'.  Let  k  and  k'  be  helds,  and  let  cr  :  k  — >  k'  be  a  held 
isomorphism.  If  F(X)  is  a  nonconstant  polynomial  in  k[X]  and  if  L  and  L' 
are  respective  splitting  helds  for  F(X)  over  k  and  for  Fa  (X )  over  k',  then  there 
exists  a  held  isomorphism  q>  :  L  — >  L'  such  that  <p  |  =  a  and  such  that  <p  sends 
the  set  of  roots  of  F(X)  to  the  set  of  roots  of  Fn  (X). 

PROOF.  We  proceed  by  induction  on  n  =  deg  F(X),  the  case  n  =  1  being 
evident.  Assume  the  result  for  degree  n  —  1.  Let  G  ( X)  be  a  prime  factor  of  F(X) 
over  k.  Then  Ga  (X )  is  a  prime  factor  of  F"  (X)  over  k'.  The  polynomials  G(X) 
and  Ga  (X)  have  roots  in  L  and  L',  respectively.  Fix  one  such  root  for  each,  say  X\ 
and  x[.  By  Theorem  9.11',  there  exists  a  held  isomorphism  <7\  :  k(xi)  — »■  k'(xj) 
extending  a  and  satisfying  a \(x\)  =  x\ .  Write  F(X)  =  (X  —  xi)H(X)  with 
coefficients  in  k(xi ),  by  the  Factor  Theorem  (Corollary  1.13).  Applying  cri  to 
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the  coefficients,  we  obtain  Fa  (X)  =  {X  —  x\ )  H"'  {X)  with  coefficients  in  k' (x\). 
Then  L  and  L'  are  splitting  fields  for  H  ( X)  and  Hn'  (X)  over  k(.V|  j  and  k' (x\), 
respectively.  By  induction  we  can  extend  a\  to  an  isomorphism  tp  :  L  — >  L',  and 
the  theorem  readily  follows.  □ 


3.  Finite  Fields 

In  this  section  we  shall  use  the  results  on  splitting  fields  in  Section  2  to  classify 
finite  fields  up  to  isomorphism.  So  far,  the  examples  of  finite  fields  that  we  have 
encountered  are  the  prime  fields  ¥p  =  lj  pi  with  p  elements,  p  being  any  prime 
number,  and  the  field  of  4  elements  in  Example  3  of  fields  in  Section  IV.4.  Every 
finite  field  has  to  contain  a  subfield  isomorphic  to  one  of  the  prime  fields  Fp,  and 
Proposition  4.33  observed  as  a  consequence  that  any  finite  field  necessarily  has 
pn  elements  for  some  prime  number  p  and  some  integer  n  >  0. 

Theorem  9.14.  For  each  pn  with  p  a  prime  number  and  with  n  a  positive 
integer,  there  exists  up  to  isomorphism  one  and  only  one  field  with  pn  elements. 
Such  a  field  is  a  splitting  field  for  Xp"  —  X  over  the  prime  field  Fp. 

If  q  =  pn ,  it  is  customary  to  denote  by  Fq  a  field  of  order  q.  The  theorem 
says  that  Fq  exists  and  is  unique  up  to  isomorphism.  Some  authors  refer  to  finite 
fields  as  Galois  fields. 

Some  preparation  is  needed  before  we  can  come  to  the  proof  of  the  theorem. 
We  need  to  carry  over  the  simplest  aspects  of  differential  calculus  to  polynomials 
with  coefficients  in  an  arbitrary  field  k.  First  we  give  an  informal  definition  of 
the  derivative  of  a  polynomial;  then  we  give  a  more  precise  definition.  For  any 
polynomial  F(X)  =  Yl"j=ocj^j  'n  k[X],  we  informally  define  the  derivative  to 
be  the  polynomial 

F\X)  =  t  JcjXJ -1  =  £  (j  +  1  )Cj+\XJ . 
j=  i  j= o 

The  more  precise  definition  uses  the  definition  of  members  of  k[X]  as  infinite 
sequences  of  members  of  k  whose  terms  are  0  from  some  point  on.  In  this  notation 
if  F  =  (co,  Ci , . . . ,  cn ,  0, . . . )  with  c in  the  /th  position  for  j  <  n  and  with  0  in 
the  ylh  position  for  j  >  n,  then  F’  =  (ci,  2c2, . . . ,  nc„,  0, . . . )  with  (j  +  l)c;+i 
in  the  /lh  position  for  j  <  n  —  1  and  with  0  in  the  /th  position  for  j  >  n  —  1 .  In 
any  event,  the  mapping  F  i— >  F’  is  k  linear  from  k[X]  to  itself.  The  operation  is 
called  differentiation. 
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Proposition  9.15.  Differentiation  on  k[X]  satisfies  the  product  rule:  F  =  GH 
implies  F'  =  G'H  +  GH'. 

PROOF.  Because  of  the  k  linearity,  it  is  enough  to  prove  the  result  for  monomi¬ 
als.  Thus  let  G(X)  =  Xm  and  H(X)  =  Xn,  so  that  F(X)  =  Xm+n.  Then 
F'(X)  =  (in  +  n)Xm+n~\  G'(X)H(X)  =  mXm+n~\  and  G(X)H'(X)  = 
nXm+n- 1.  Hence  we  indeed  have  F'(X)  =  G'(X)H(X )  +  G(X)H'(X).  □ 

Corollary  9.16.  If  n  is  a  positive  integer,  if  r  is  in  k,  and  if  F(X)  =  (X  —  rf 
in  k[X],  then  F'(X)  =n{X-r)n~\ 

PROOF.  This  is  immediate  by  induction  from  Proposition  9. 15  since  the  deriv¬ 
ative  of  X  —  r  is  1 .  □ 

Corollary  9.17.  Let  r  be  in  k,  and  let  F(X)  be  in  k[X].  If  (X  —  r)2  divides 
F(X),  then  F(r)  =  F'(r )  =  0.  Conversely  if  F(r )  =  F'(r)  =  0,  then  (X  -  r)2 
divides  F(X). 

PROOF.  Write  F{X)  =  (X  —  r)2G(X).  If  we  substitute  r  for  X,  we  see  that 
F(r  )  =  0.  If  instead  we  differentiate,  using  Proposition  9.15  and  Corollary  9.16, 
then  we  obtain  F'(X)  =  2(X  -  r)G(X)  +  (X  -  r)2G'(X).  Substituting  r  for  X, 
we  obtain  F'(r )  =0  +  0  =  0. 

For  the  converse,  let  F(r)  =  F'(r)  =  0.  Proposition  4.28a  shows  that  F(X)  = 
{X  —  r)G(X).  Differentiating  this  identity  by  means  of  Proposition  9.15  gives 
F'(X)  =  G(X)+(X-r)G’(X).  Substituting r  for X  yields 0  =  F'(r )  =  G(r)+0 
and  shows  that  G(r )  =  0.  By  Proposition  4.28,  G(X )  =  (X  —  r)H(X).  Hence 
F(X)  =  (X  -  r)2H(X).  □ 

Lemma  9.18.  If  k  is  a  field  of  characteristic  p  /  0,  then  the  map  <p  :  k  — >  k 
given  by  <p(x)  =  xp  is  a  field  mapping. 

Remark.  The  map  x  xp  is  often  called  the  Frobenius  map.  If  k  is  a  finite 
field,  then  it  must  carry  k  onto  k  since  one-one  implies  onto  for  functions  from  a 
finite  set  to  itself;  in  this  case  the  map  is  an  automorphism  of  k. 

Proof.  The  computation  (p(uv)  =  {uv)p  =  upvp  =  <p(u)<p( t>)  shows  that  <p 
respects  products.  If  u  and  v  are  in  k,  then 

p~1 

<p(u  +  v)  =  (u  +  v)p  =  <p{u)  +  (P:)up~jvj  +  <p(  v)  =  t p(u)  +  <p(  v), 

j= 1 

the  last  equality  holding  since  the  binomial  coefficient  (p)  has  a  p  in  the  numerator 
for  1  <  j  <  p—  1.  Thus  (p  is  a  ring  homomorphism.  Since  cp(  I )  =  I ,  <^  is  a  field 
mapping.  □ 
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Proof  of  uniqueness  in  Theorem  9.14.  Let  k  be  a  finite  field,  say  of 
characteristic  p ,  and  let  P  be  the  prime  field  of  order  p  within  k.  We  know  that  P 
is  isomorphic  to  Fp  =  7LI p7L.  Since  k  is  a  finite-dimensional  vector  space  over  P, 
we  know  also  that  k  has  order  q  =  pn  for  some  integer  n  >  0.  The  multiplicative 
group  kx  of  k  thus  has  order  q  —  1,  and  every  x  7M)  in  k  therefore  satisfies 
xq~x  =  1.  Taking  x  =  0  into  account,  we  see  that  every  member  of  k  satisfies 
xq  =  x.  Forming  the  polynomial  Xq  —  X  in  P[X],  we  see  that  every  member  of 
k  is  a  root  of  this  polynomial.  Iterated  application  q  times  of  the  Factor  Theorem 
(Corollary  1.13)  shows  that  Xq  —  X  factors  into  degree-one  factors  in  k.  Since 
every  member  of  k  is  a  root  of  Xq  —  X,  k  is  a  splitting  field  of  Xq  —  X  over  P. 
Then  the  uniqueness  of  the  prime  field  up  to  isomorphism,  in  combination  with 
the  uniqueness  of  the  splitting  field  of  Xq  —  X  given  in  Theorem  9.13',  shows 
that  k  is  uniquely  determined  up  to  isomorphism.  □ 

Proof  of  existence  in  Theorem  9.14.  Let  q  =  p"  be  given,  and  define  k  to 
be  a  splitting  field  of  Xq  —  X  over  Fp  =  TLj  p7L.  The  field  k  exists  by  Theorem 
9.12,  and  it  has  characteristic  p.  Since  Xq  —  X  is  monic  of  degree  q,  the  definition 
of  splitting  field  says  that  we  can  write 

xq  -  X  =  (X  -  m)(X  -  u2)  ■  ■  ■  (X  -  uq)  with  all  u}  e  k. 

Because  of  Lemma  9.18,  the  map  <p(u )  =  uq ,  which  is  the  «,h  power  of  the 
map  u  i->  up,  is  a  held  mapping  of  k  into  itself.  The  members  of  k  fixed  by 
c p  form  a  subheld  of  k,  and  these  elements  of  k  are  exactly  the  members  of  the 
set  S  =  { m  1 , . . . ,  uq}.  Therefore  S'  is  a  subheld  of  k,  necessarily  containing 
Fp  =  7LI pTL.  Since  Xq  —  X  splits  in  S  and  since  the  roots  of  Xq  —  X  generate 
S,  S  is  a  splitting  held  of  Xq  —  X  over  Fp.  In  other  words,  S  =  k.  To  complete 
the  proof,  it  is  enough  to  show  that  the  elements  u  \ ,  . . . ,  uq  are  distinct,  and  then 
k  will  be  a  held  of  q  elements.  The  question  is  therefore  whether  some  root  of 
Xq  —  X  has  multiplicity  at  least  2,  i.e.,  whether  (X  —r)2  divides  Xq  —  X  for  some 
r  in  k.  Corollary  9.17  gives  a  necessary  condition  for  this  divisibility,  saying  that 
the  derivative  of  Xq  —  X  must  have  rasa  root.  However,  the  derivative  of  Xq  —  X 
is  qXq~l  —  1  =  —  1,  and  the  constant  polynomial  —  1  has  no  roots.  We  conclude 
that  k  has  q  elements.  □ 

Corollary  9.19.  If  q  and  r  are  integers  with  2  <  q  <  r,  then  the  hnite  held 
Fq  is  isomorphic  to  a  subheld  of  the  hnite  held  Fr  if  and  only  if  r  =  q"  for  some 
integer  n  >  1 . 

PROOF.  If  Fq  is  isomorphic  to  a  subheld  of  Fr ,  then  we  may  consider  F,.  as  a 
vector  space  over  Fq,  say  of  dimension  n.  In  this  case,  F,-  has  q"  elements. 

Conversely  let  r  =  q" ,  and  regard  Fr  as  a  splitting  held  of  Xq"  —  X  over  the 
prime  held  Fp,  by  Theorem  9. 14.  Let  S  be  the  subset  of  F,  of  all  roots  of  Xq  —  X. 
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Putting  a  =  q  —  1  and  k  =  =  qn  1  +  qn  2  +  •  •  •  +  1 ,  we  have 

Xka  -  1  =  (Xa  -  \){X{k~l)a  +  x(k~2>a  H - b  1). 

Multiplying  by  X,  we  see  that  Xq  —  X  is  a  factor  of  Xq"  —  X.  Since  Xq"  —  X 
splits  in  F,-  and  has  distinct  roots,  the  same  is  true  of  Xq  —  X.  Therefore  F  =  q. 

Let  q  =  pm .  The  mth  power  of  the  homomorphism  of  Lemma  9. 1 8  on  k  =  Fr 
is  x  i->-  xq,  and  the  subset  of  Fr  fixed  by  this  homomorphism  is  a  subfield.  Thus 
F  is  a  subfield,  and  it  has  q  elements.  □ 


4.  Algebraic  Closure 

Algebraically  closed  fields— those  for  which  every  nonconstant  polynomial  with 
coefficients  in  the  field  has  a  root  in  the  field— were  introduced  in  Section  V.  1 ,  and 
it  was  mentioned  at  that  time  that  every  field  is  a  subfield  of  some  algebraically 
closed  field.  We  shall  prove  that  existence  theorem  in  this  section  in  a  form 
lending  itself  to  a  uniqueness  result. 

Throughout  this  section  let  k  be  a  field.  We  begin  by  giving  further  descriptions 
of  algebraically  closed  fields  that  take  the  theory  of  Sections  1-2  into  account. 

Proposition  9.20.  The  following  conditions  on  the  field  k  are  equivalent: 

(a)  k  has  no  nontrivial  algebraic  extensions, 

(b)  every  irreducible  polynomial  in  k[X]  has  degree  1, 

(c)  every  polynomial  in  k[X]  of  positive  degree  has  at  least  one  root  in  k, 

(d)  every  polynomial  in  k[Z]  of  positive  degree  factors  over  k  into  polyno¬ 
mials  of  degree  1 . 

PROOF.  If  (a)  holds,  then  (b)  holds  since  any  irreducible  polynomial  of  degree 
greater  than  1  would  give  a  nontrivial  simple  algebraic  extension  (Theorem  9.10). 
If  (b)  holds  and  a  polynomial  of  positive  degree  is  given,  apply  (b)  to  an  irreducible 
factor  to  see  that  the  given  polynomial  has  a  root;  thus  (c)  holds.  Condition  (c) 
implies  condition  (d)  by  induction  and  the  Factor  Theorem.  If  (d)  holds  and  if 
K  is  an  algebraic  extension  of  k,  let  x  be  in  K,  and  let  F(X)  be  the  minimal 
polynomial  of  x  over  k.  Then  F(X)  is  irreducible  over  k,  and  (d)  says  that  F(X) 
has  degree  1 .  Hence  x  is  in  k,  and  we  conclude  that  K  =  k.  □ 

A  field  satisfying  the  equivalent  conditions  of  Proposition  9.20  is  said  to  be 

algebraically  closed. 
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Examples  of  algebraically  closed  fields. 

(1)  The  Fundamental  Theorem  of  Algebra  (Theorem  1.18)  says  that  C  is 
algebraically  closed.  This  theorem  was  not  proved  in  Chapter  I,  but  a  proof 
will  be  given  in  this  chapter  in  Section  10. 

(2)  Let  K  be  the  subset  of  all  members  of  C  that  are  algebraic  over  Q.  By 
Corollary  9.9,  IK  is  a  subheld  of  C.  Example  1  shows  that  every  polynomial  in 
Q[X]  splits  in  K,  and  Lemma  9.21  below  then  allows  us  to  conclude  that  K  is 
algebraically  closed. 

(3)  Fix  a  prime  number  p ,  and  start  with  ko  =  F/(  as  the  prime  held  Z/pZ. 
Enumerate  the  members  of  Fp[A],  letting  Fn  ( X)  be  the  nth  such  polynomial.  We 
construct  k„  by  induction  on  n  so  that  k„  is  a  splitting  held  for  Fn  ( X)  over  k„_i 
when  n  >  1.  Then  ko  C  kj  C  k2  C  •  •  •  is  an  increasing  sequence  of  helds 
containing  ¥p.  Let  K  be  the  union.  Any  two  elements  of  IK  lie  in  a  single  k„ ,  and 
it  follows  that  IK  is  closed  under  the  held  operations.  Any  three  elements  lie  in  a 
single  k„ ,  and  it  follows  that  any  of  the  dehning  properties  of  a  held  is  valid  in 
K  because  it  is  valid  in  k„.  Therefore  K  is  a  held.  This  held  is  an  extension  of 
Fp,  and  every  polynomial  in  Fp[X]  splits  in  K.  As  in  Example  2,  Lemma  9.21 
below  shows  that  K  is  algebraically  closed. 

Lemma  9.21.  If  K/k  is  an  algebraic  extension  of  helds  and  if  every  non¬ 
constant  polynomial  in  k[X]  splits  into  degree-one  factors  in  IK,  then  K  is 
algebraically  closed. 

PROOF.  Let  K'  be  an  algebraic  extension  of  IK,  and  let  x  be  in  K\  Let  G(X) 
be  the  minimal  polynomial  of  x  over  IK,  and  write  G(X)  as 

G(X)  =  X"  +  cn-XXn~l  +  •  •  •  +  c0  with  all  c,  e  K. 

Then  x  is  algebraic  over  k(c„_i,  . . . ,  co),  which  is  a  hnite  extension  of  k  by 
Theorem  9.8.  By  Corollary  9.7,  x  lies  in  a  hnite  extension  of  k.  Thus  Proposition 
9.4  shows  that  x  is  algebraic  over  k.  Let  F(X)  be  the  minimal  polynomial  of  x 
over  k.  By  assumption  this  splits  over  K,  say  as 

F(X)  =  (X  -  jci)  •  •  •  (X  -  xm)  with  all  *,■  £  K. 

Evaluating  at  x  and  using  the  fact  that  F  (x )  =  0,  we  see  that  x  =  xj  for  some  j. 
Therefore  a  is  in  IK,  and  IK  is  algebraically  closed.  □ 

An  extension  held  K/k  is  an  algebraic  closure  of  k  if  K  is  algebraic  over  k 
and  if  K  is  algebraically  closed.  Example  2  of  algebraically  closed  helds  above 
gives  an  algebraic  closure  of  Q,  and  Example  3  gives  an  algebraic  closure  of  Fp . 
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Theorem  9.22  (Steinitz).  Every  field  k  has  an  algebraic  closure,  and  this  is 
unique  up  to  k  isomorphism. 

Remarks.  The  proof  of  existence  is  modeled  on  the  argument  for  Example  3 
of  algebraic  closures.  However,  we  are  not  free  in  general  to  use  a  simple  union 
of  a  sequence  of  fields  and  have  to  work  harder.  Because  there  is  no  evident  set 
of  possibilities  within  which  we  are  forming  extension  fields,  Zorn’s  Lemma  is 
inconvenient  to  use  and  tends  to  result  in  an  unintuitive  construction.  Instead, 
we  use  Zermelo’s  Well-Ordering  Theorem,  whose  use  more  closely  parallels  the 
inductive  construction  in  Example  3. 

Proof  of  existence.  With  k  as  the  given  field,  let  S  be  the  set  of  nonconstant 
polynomials  s{X)  in  k[X],  and  introduce  a  well  ordering  into  S  by  means  of 
Zermelo’s  Well-Ordering  Theorem  (Section  A5  of  the  appendix).  Let  us  write  -< 
for  “strictly  precedes  in  the  ordering"  and  ^  for  “equals  or  strictly  precedes."  For 
each  s  e  S,  let  s  be  the  successor  of  s,  i.e.,  the  first  element  among  all  elements  t 
with  s  <  t.  We  write  so  for  the  first  element  of  S.  Without  loss  of  generality,  we 
may  assume  that  S  has  a  last  element  .s^.  The  idea  is  to  construct  simultaneously 
two  kinds  of  things: 

(i)  an  algebraic  extension  field  ks/k  for  each  s  e  S  such  that  kso  =  k  and 
such  that  ks  is  a  splitting  field  for  s(X)  over  ks  whenever  s  <  s0 0, 

(ii)  a  field  mapping  <put  :  k,  — >  k„  for  each  ordered  pair  of  elements  t  and  u 
in  S  having  t  ^  u,  such  that  <ptt  =  1  for  all  t  and  such  that  t  ^  u  ^  v 
implies  <pvt  =  (pvu(pllt. 

These  extension  fields  and  mappings  are  to  be  such  that  ks  =  (J;  <s  <pSi  Ik/ ) 
whenever  .v  is  not  a  successor  and  is  not  so-  If  such  a  system  of  extension  fields 
and  field  homomorphisms  exists,  then  Lemma  9.21  applies  to  a  splitting  field 
over  kSoo  of  the  nonconstant  polynomial  s^iX)  and  shows  that  this  splitting  field 
is  algebraically  closed;  since  this  splitting  field  is  an  algebraic  extension  of  k,  it 
is  an  algebraic  closure  of  k. 

A  partial  such  system  through  to  means  a  system  consisting  of  fields  ks  with 
s  ^  to  and  field  homomorphisms  <put  with  t  ^  u  ^  to  such  that  the  above 
conditions  hold  as  far  as  they  are  applicable.  A  partial  system  exists  through 
the  first  member  so  of  S  because  we  can  take  kSo  =  k  and  <psoso  =  1 .  Arguing 
by  contradiction,  we  suppose  that  such  a  system  of  extension  fields  and  field 
homomorphisms  fails  to  exist  through  some  member  of  S.  Let  to  be  the  first 
member  of  S  such  that  there  is  no  partial  system  through  to. 

Suppose  that  to  is  the  successor  of  some  element  t\  in  S.  We  know  that  a  partial 
system  exists  through  l\ .  If  we  let  kro  be  a  splitting  field  for  t\  (X)  over  k,  ,  and 
if  we  define 

f  <Ptoh<Ptlt  for t<h, 

<Ptot  ~  1  ,  r  . 

(1  for  t  =  to. 
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then  the  enlarged  system  is  a  partial  system  through  to,  contradiction.  Thus  to 
cannot  be  the  successor  of  some  element  of  S. 

When  to  is  not  a  successor,  at  least  k,  is  defined  for  t  <  to  and  <pUI  is  defined 
for  t  yp  w  <  to.  We  want  to  form  a  union,  but  we  have  to  keep  the  field  operations 
aligned  properly  in  the  process.  Define  a  “t -allowable  tuple”  to  be  a  function 
u  i->  xu  defined  for  t  pp  u  <  to  such  that  xu  is  in  k„  and  <pvu  (xu )  =  xv  whenever 
/  ~  u  ^  v  -<  to-  If  x  is  in  kr,  then  an  example  of  a  t -allowable  tuple  is  given  by 
u  i->  (pul(x )  for  t  <  u  ■<  to. 

If  t  <  to  and  t'  <  to,  then  we  can  apply  field  operations  to  the  /-allowable  tuple 
u  xu  and  to  the  /'-allowable  tuple  u  yu,  obtaining  max  (7,  r'j-allowahle 
tuples  u  xu  +  yu,  u  h*  —xu,  u  i->  xuyu,  and  xu  (->■  x~x  as  long  as  xt  0. 
These  operations  are  meaningful  since  each  <p„„  is  a  field  mapping. 

If  t  <  to  and  t'  <  to,  we  say  that  the  /-allowable  tuple  u  xu  is  equivalent  to 
the  /'-allowable  tuple  u  i->  yu  if  xu  =  yu  for  maxfr.  t')  ^  11  -<  to.  The  result  is 
an  equivalence  relation,  and  the  equivalence  relation  respects  the  field  operations 
in  the  previous  paragraph.  We  define  kro  to  be  the  set  of  equivalence  classes  of 
allowable  tuples  with  the  inherited  field  operations.  The  0  element  is  the  class  of 
the  so-allowable  tuple  u  m*-  0,  and  the  multiplicative  identity  is  the  class  of  the 
vo-allowable  tuple  u  i->  1.  It  is  a  routine  matter  to  check  that  k,0  is  a  field. 

If  t  <  to  is  given,  we  define  the  function  <p/(1,  :  k,  ->  k,0  as  follows:  if  X  is 
in  k?,  we  form  the  /-allowable  tuple  u  (p„i  (x)  and  take  its  equivalence  class, 
which  is  a  member  of  k,0,  as  <p,1(,  (x  ).  Then  (p,0,  is  evidently  a  field  mapping.  It 
is  evident  also  that  (ptaV<pvu  =  (pt0u  when  u  ^  v  -<  t0.  Defining  (p,oto  to  be  the 
identity,  we  have  a  complete  system  of  field  mappings  <pvu  for  k,0. 

The  final  step  is  to  check  that  k?0  is  the  union  of  the  images  of  the  (p,f]i  for  t  <  to. 
Thus  choose  a  representative  of  an  equivalence  class  in  k,0.  Let  the  representative 
be  a  /-allowable  tuple  u  i->  xu  for  /  ^  u  <  to.  The  element  x,  is  in  kf,  and  the 
condition  xu  =  (pm  (x, )  is  just  the  condition  that  the  class  of  u  x„  be  the  image 
of  xt  under  (ph]t .  Hence  every  member  of  k,n  is  in  the  image  of  some  <ptot  with 
t  <  to,  and  we  have  a  contradiction  to  the  hypothesis  that  a  partial  system  through 
to  does  not  exist.  This  completes  the  proof  of  existence.  □ 


For  the  uniqueness  in  Theorem  9.22,  we  again  need  a  serious  application  of 
the  Axiom  of  Choice,  but  here  Zorn’s  Lemma  can  be  applied  fairly  routinely. 
The  proof  will  show  a  little  more  than  is  needed,  and  in  fact  the  uniqueness  in 
Theorem  9.22  will  be  derived  as  a  consequence  of  Theorem  9.23  below. 


Theorem  9.23.  Let  K'  be  an  algebraically  closed  field,  and  let  K  be  an  algebraic 
extension  of  a  field  k.  If  tp  is  a  field  mapping  of  k  into  IK',  then  ip  can  be  extended 
to  a  field  mapping  of  K  into  IK'. 
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Proof  of  uniqueness  in  Theorem  9.22  using  Theorem  9.23.  Let  K  and 
K'  be  algebraic  closures  of  k,  and  let  <p  :  k  — »■  K'  be  the  inclusion  mapping. 
Theorem  9.23  supplies  a  held  mapping  :  K  — >  K'  such  that  <f>|k  =  <p,  i.e., 
such  that  fixes  k.  Since  K  is  an  algebraic  closure  of  k,  so  is  O(K).  Then  W  is 
an  algebraic  extension  of  the  algebraically  closed  held  3>(K),  and  we  must  have 
<f>(K)  =  K'.  Thus  is  ak  isomorphism  of  IK  onto  K'. 

Proof  of  Theorem  9.23.  Let  S  be  the  set  of  all  triples  (L,  L',  \/F)  such 
that  L  is  a  held  with  k  C  L  C  K  and  x/i  is  a  held  mapping  of  L  onto  the 
subheld  L'  of  K'  with  xfr\k  =  <p.  The  set  S  is  nonempty  since  (k,  <p(k),  <p)  is 
a  member  of  it.  Dehning  (Li,  L'( ,  i/q)  C  (Li,  L'2,  x/r2)  to  mean  that  Li  C  L2, 
that  Lj  C  Li,,  and  that  \j/\  as  a  set  of  ordered  pairs  is  a  subset  of  xfr2  as  a  set 
of  ordered  pairs,  we  partially  order  S  by  inclusion  upward.  If  {(L„,  ¥'(y.  1 J/a)}  is 
a  nonempty  chain  in  S,  form  the  triple  ( [Ja  L„,  |Jff  L^,  |Ja  1 fra),  and  put  x/r  = 
U„  1 K-  Then  xj/  (  U„  U)  =  U„  and  consequently  ( \Ja  L„,  \Ja  L^,  \Ja  \jra) 
is  an  upper  bound  in  S  for  the  chain.  By  Zorn's  Lemma,  S  has  a  maximal  element 
(Lo,  Lq,  i/sq).  We  shall  prove  that  Lo  =  IK,  and  the  proof  will  be  complete. 

Fix  x  in  IK,  and  let  F (X )  be  the  minimal  polynomial  of  x  over  Lo.  The 
minimal  polynomial  of  i/io(x)  over  Lq  is  then  F^°(X).  Since  ¥1  is  algebraically 
closed,  F^°(X)  has  a  root  x'  in  K'.  By  Theorem  9.1  L,  t/tq  :  Lo  L'  can  be 
extended  to  an  isomorphism  h'o  :  Lo(x)  — >■  Lq(x')  such  that  d/^ix)  =  x' .  Then 
(Lo(x),  Lq(x'),  vho)  is  in  S  and  contains  (Lo,  Lq,  1//0).  This  containment,  if  strict, 
would  contradict  the  fact  that  (Lo,  Lq,  x/iq)  is  a  maximal  element  of  S.  Thus 
equality  must  hold:  Lo(x)  =  Lo.  Therefore  x  is  in  Lo,  and  we  conclude  that 
Lo  =  IK.  LI 

The  use  of  algebraic  closures  allows  us  to  simplify  understanding  of  splitting 
fields.  If  we  are  working  with  a  field  k  and  is  k  is  a  fixed  algebraic  closure  of  k, 
then  the  existence  and  uniqueness  of  the  splitting  field  of  a  polynomial  F(X)  in 
k[X]  becomes  evident;  no  isomorphisms  are  involved.  Namely  let  oq, . . . ,  a„  be 
the  roots  of  F(X )  in  k.  Then  the  subfield  of  k  generated  by  k  and  oq , . . . ,  an  is 
the  splitting  field  of  F(X),  and  it  is  manifestly  unique.  Henceforth  when  we  refer 
to  the  splitting  field  of  a  polynomial  over  a  field  k,  it  is  with  an  understanding  of 
working  within  a  fixed  algebraic  closure  in  this  way. 


5.  Geometric  Constructions  by  Straightedge  and  Compass 

Classical  Euclidean  geometry  attached  a  certain  emphasis  to  constructions  in  the 
Euclidean  plane  that  could  be  made  by  straightedge  and  compass.  These  are 
often  referred  to  casually  as  constructions  by  ‘‘ruler  and  compass,”  but  one  is  not 
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allowed  to  use  the  markings  on  a  ruler.  Thus  “straightedge  and  compass”  is  a 
more  accurate  description. 

In  these  constructions  the  starting  configuration  may  be  regarded  as  a  line  with 
two  points  marked  on  the  line.  Allowable  constructions  are  the  following:  to  form 
the  line  through  a  given  point  different  from  finitely  many  other  lines  through  that 
point,  to  form  the  line  through  two  distinct  points,  to  form  a  circle  with  a  given 
center  and  a  radius  different  from  that  of  finitely  many  other  circles  through  the 
point,  and  to  form  a  circle  with  a  given  center  and  radius.  Intersections  of  a  line 
or  a  circle  with  previous  lines  and  circles  establish  new  points  for  continuing  the 
construction. 

For  example  a  line  perpendicular  to  a  given  line  at  a  given  point  can  be 
constructed  by  drawing  any  circle  centered  at  the  point,  using  the  two  intersection 
points  as  centers  of  new  circles,  drawing  those  circles  so  as  to  have  radius  larger 
than  the  first  circle,  and  forming  the  line  between  their  two  points  of  intersection. 
An  angle  at  the  point  P  of  intersection  between  two  intersecting  lines  A  and  B 
may  be  bisected  by  drawing  any  circle  centered  at  P,  selecting  one  of  the  points 
of  intersection  on  each  line  so  that  P  and  the  two  new  points  Q  and  R  describe 
the  angle,  drawing  circles  with  that  same  radius  centered  at  Q  and  R,  and  forming 
the  line  between  the  points  of  intersection  of  the  two  circles.  And  so  on. 

Three  notable  problems  remained  unsolved  in  antiquity: 

(i)  how  to  double  a  cube,  i.e.,  how  to  construct  the  side  of  a  cube  of  double 
the  volume  of  a  given  cube, 

(ii)  how  to  trisect  any  constructible  angle,  i.e.,  how  to  divide  the  angle  into 
three  equal  parts  by  means  of  constructed  lines, 

(iii)  how  to  square  a  circle,  i.e.,  how  to  construct  the  side  of  a  square  whose 
area  equals  that  of  a  given  disk. 

In  this  section  we  shall  use  the  elementary  held  theory  of  Sections  1-2  to  show  that 
doubling  a  cube  and  trisecting  a  60-degree  angle  are  impossible  with  straightedge 
and  compass.  As  to  (iii),  we  shall  reduce  a  proof  of  the  impossibility  of  squaring 
the  circle  to  a  proof  that  n  is  transcendental  over  Q.  This  latter  proof  we  give  in 
Section  14. 

The  first  step  is  to  translate  the  problem  of  geometric  constructibility  into  a 
statement  in  algebra.  Since  we  are  given  two  points  on  a  line,  we  can  introduce 
Cartesian  coordinates  for  the  Euclidean  plane,  taking  one  of  the  points  to  be  (0,  0) 
and  the  other  point  to  be  (1 , 0).  Points  in  the  Euclidean  plane  are  now  determined 
by  their  Cartesian  coordinates,  which  determine  all  distances.  Distances  in  turn 
can  be  laid  off  on  the  x-axis  from  (0,  0).  Thus  the  question  becomes,  what  points 
on  the  x-axis  can  be  constructed? 

Let  C  be  the  set  of  constructible  x  coordinates.  We  are  given  that  0  and  1  are 
in  C.  Closure  of  C  under  addition  and  subtraction  is  evident;  the  straightedge  is 
not  even  necessary  for  this  step.  Figure  9. 1  indicates  why  the  positive  elements 
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Figure  9.1.  Closure  of  positive  constructible  x  coordinates 
under  multiplication  and  division. 

of  C  are  closed  under  multiplication  and  division.  In  more  detail  we  take  two 
intersecting  lines  and  mark  three  known  positive  members  of  C  as  the  distances 
a ,  b,  c  in  the  figure.  Then  we  form  the  line  through  the  two  points  marking  a 
and  h.  and  we  form  a  line  parallel  to  that  line  through  the  point  marked  off  by 
the  distance  c.  The  intersection  of  this  parallel  line  with  the  other  original  line 
defines  a  distance  d.  Then  a/b  =  c/d,  and  so  d  =  bc/a.  By  taking  a  =  1,  we 
see  that  we  can  multiply  any  two  members  b  and  c  in  C,  obtaining  a  result  in  C. 
By  instead  taking  c  =  1,  we  see  that  we  can  divide.  The  conclusion  is  that  C  is  a 
field. 


Figure  9.2.  Closure  of  positive  constructible  x  coordinates 
under  square  roots. 

Figure  9.2  indicates  why  the  positive  elements  of  C  are  closed  under  taking 
square  roots.  In  more  detail  let  a  and  b  be  positive  members  of  C  with  a  <  b.  By 
forming  a  circle  whose  diameter  is  a  segment  of  length  b  and  by  forming  a  line 
perpendicular  to  that  line  at  the  point  marked  by  a,  we  determine  the  pictured 
right  triangle  with  a  side  c  satisfying  a/c  =  c/b.  Then  c  =  -Jab.  By  taking  one 
of  a  and  b  to  be  1 ,  we  see  that  the  square  root  of  the  other  of  a  and  b  is  in  C.  This 
completes  the  proof  of  the  direct  part  of  the  following  theorem. 

Theorem  9.24.  The  set  C  of  x  coordinates  that  can  be  constructed  from  x  =  1 
and  x  =  0  by  straightedge  and  compass  forms  a  subfield  of  M  such  that  the  square 
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root  of  any  positive  element  of  the  field  lies  in  the  held.  Conversely  the  members 
of  C  are  those  real  numbers  lying  in  some  subheld  F„  of  M  of  the  form 

^i=Q(V«o)>  F2  =  Fxijal),  . ..,  Fn  =  Fn-i(Jan-i) 
with  each  a,  in  Fj  and  with  ciq,  . . . ,  a„  _  i  all  >  0. 

Proof  of  CONVERSE.  Suppose  we  have  a  subheld  F  =  Fn  of  M  of  the 
kind  described  in  the  statement  of  the  theorem.  The  possibilities  for  obtaining 
a  new  constructible  point  from  F  by  an  additional  construction  arise  from  three 
situations:  the  intersection  of  two  lines,  each  passing  through  two  points  of  F\ 
the  intersection  of  a  line  and  a  circle,  each  determined  by  data  from  F\  and  the 
intersection  of  two  circles,  each  determined  by  data  from  F. 

In  the  case  of  two  intersecting  lines,  each  line  is  of  the  form  ax  +  by  =  c  for 
suitable  coefficients  a ,  b.  c  in  F,  and  the  intersection  is  a  point  (x,  y)  in  F  x  F. 
So  intersections  of  lines  do  not  force  us  to  enlarge  F. 

For  a  line  and  a  circle,  we  assume  that  the  line  is  given  by  ax  +  by  =  c  with 
a,  b,  c  in  F,  that  the  circle  has  radius  in  F  and  center  in  F  x  F,  and  that  the  lines 
and  the  circle  actually  intersect.  The  circle  is  then  given  by  (x—h)2+(y—k)2  =  r2 
with  h,  k,  r  in  F.  Substitution  of  the  equation  of  the  line  into  the  equation  of  the 
circle  gives  us  a  quadratic  equation  either  for  x,  and  x  then  determines  y,  or  for 
y,  and  y  then  determines  x.  The  quadratic  equation  has  real  roots,  and  thus  its 
discriminant  is  >  0.  The  result  is  that  x  and  y  are  in  a  field  F(s/T )  for  some 
/  >  0  in  F. 

For  two  circles,  without  loss  of  generality,  we  may  take  their  equations  to  be 

x2  +  y2  =  r 2  and  (x  —  h)2  +  (y  —  h)2  =  s2 

with  r,h,k,s  in  F.  Subtracting  gives  2 xh  +  2 yk  =  h2  +  k 2  —  s2  +  r2.  With  this 
equation  and  with  x2  +  y2  =  r2,  we  again  have  a  line  and  circle  that  are  being 
intersected.  Thus  the  same  remarks  apply  as  in  the  previous  paragraph. 

The  conclusion  is  that  any  new  single  construction  of  points  of  intersection  by 
straightedge  and  compass  leads  from  F  to  F(  s/T )  for  some  /  >  0  in  F.  Thus 
every  member  of  the  set  C  is  as  described  in  the  theorem.  □ 

To  apply  the  theorem  to  prove  the  impossibility  of  the  three  never-accomplished 
constructions  that  were  described  earlier  in  the  section,  we  observe  that  [Fj  :  Fj  _  i  ] 
in  the  theorem  equals  1  or  2  for  each  i.  Consequently  every  member  of  the 
constructible  set  C  lies  in  a  finite  algebraic  extension  of  Q  of  degree  2k  for  some  k. 

For  the  problem  of  doubling  a  cube,  the  question  amounts  to  constructing  1/2. 
We  argue  by  contradiction.  If  1/2  lies  in  Fn  as  in  the  theorem,  then  Q(  1/2  )  C  Fn. 
With  k  as  the  integer  <  n  such  that  [  F„  :  Q]  =  2k,  Corollary  9.7  gives 

2k  =  [Fn  :  Q]  =  [Fn  :  Q(4/2)]  [Q(v^)  :  Q]  =  3 [Fn  :  Q(V2)]. 
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Thus  3  must  divide  a  power  of  2,  and  we  have  arrived  at  a  contradiction.  We 
conclude  that  it  is  not  possible  to  double  a  cube  with  straightedge  and  compass. 

For  the  problem  of  trisecting  any  constructible  angle,  let  us  show  that  a  60° 
angle  cannot  be  trisected.  A  60°  angle  is  itself  constructible,  being  the  angle 
between  two  sides  in  an  equilateral  triangle.  Trisecting  a  60°  angle  amounts  to 
constructing  cos  20°;  sin 20°  is  then  (1  —  cos2  20°)1,/2.  To  proceed,  we  derive  an 
equation  satisfied  by  cos  20°,  starting  from 

(cos  20°  +  i  sin20°)3  =  cos  60°  +  i  sin  60°  =  \  +  ^ . 

We  expand  the  left  side  and  extract  the  real  part  of  both  sides  to  obtain 
cos3  20°  —  3  cos  20°  sin2  20°  =  j . 

Substituting  sin2  20°  =  1  —  cos2  20°  and  simplifying,  we  see  that  r  =  cos  20° 
satisfies 

4r3  —  3r  —  \  =  0. 

Arguing  with  Corollary  8.20  as  in  Example  2  of  splitting  fields  in  Section  2,  we 
readily  check  that  4X 3  —  3  A  —  f  is  irreducible  over  Q.  Hence  [Q(cos  20°)  :  Q] 
=  3,  and  we  are  led  to  the  same  contradiction  as  for  the  problem  of  doubling 
the  cube.  Therefore  it  is  not  possible  to  trisect  a  60°  angle  with  straightedge  and 
compass. 

For  the  problem  of  squaring  a  circle,  let  A  be  the  area  of  the  circle,  and  let 
r  be  the  radius.  If  the  square  has  side  x,  then  x2  =  A  =  irr2,  with  r  given. 
Thus  x  =  r  s/tt,  and  the  essence  of  the  matter  is  to  construct  Jtx  .  However,  n 
is  known  to  be  transcendental  by  a  theorem  of  F.  Lindemann  (1882);  we  give  a 
proof  in  Section  14.  Since  jt  is  transcendental,  sfrc  is  transcendental. 

A  fourth  notable  problem,  which  leads  to  further  insights,  concerns  the  con¬ 
struction  of  a  regular  polygon  of  outer  radius  1  with  n  sides.  This  construction 
is  easy  with  straightedge  and  compass  when  n  is  a  power  of  2  or  is  3  times  a 
power  of  2,  and  Euclid  showed  that  a  construction  is  possible  for  n  =  5.  But  a 
construction  cannot  be  managed  with  straightedge  and  compass  for  n  =  9,  for 
example,  because  a  central  angle  in  this  case  is  40°  and  the  constructibility  of 
cos  40°  would  imply  the  constructibility  of  cos  20°.  Thus  the  question  is,  for  what 
values  of  n  can  a  regular  n-gon  be  constructed  with  straightedge  and  compass? 

The  remarkable  answer  was  given  by  Gauss.  By  a  Fermat  number  is  meant 
any  integer  of  the  form  2  +  1 .  A  Fermat  prime  is  a  Fermat  number  that  is 

prime.  The  Fermat  numbers  for  A  =  0,  1,  2,  3,  4  are  3,  5,  17,  257,  65537,  and 
each  is  a  Fermat  prime.  No  larger  Fermat  primes  are  known.2  The  answer  given 

2  Many  Fermat  numbers  for  N  >  5  are  known  not  to  be  prime,  sometimes  by  the  discovery  of 
an  explicit  factor  and  sometimes  by  a  verification  that  3  to  the  power  2“  is  not  congruent  to  —  1 
modulo  (2Z  +1).  (Cf.  Lemma  9.46.)  For  example  Euler  discovered  that  641  divides  2"  +  1. 
Computer  calculations  have  shown  that  22  +l  is  not  prime  if  5  <  N  <  32. 


5.  Geometric  Constructions  by  Straightedge  and  Compass 


473 


by  Gauss,  which  we  shall  prove  in  stages  in  Sections  6-9,  is  as  follows. 

Theorem  9.25  (Gauss).3  A  regular  77-gon  is  constructible  with  straightedge 
and  compass  if  and  only  if  n  is  the  product  of  distinct  Fermat  primes  and  a  power 
of  2. 

We  can  show  the  relevance  of  Fermat  primes  right  now,  and  we  can  give  an 
indication  that  if  n  is  a  prime  number,  then  a  regular  77-gon  can  be  constructed  if 
and  only  if  n  is  a  Fermat  prime.  But  a  full  proof  even  of  this  statement  will  make 
use  of  Galois  groups,  which  we  take  up  in  the  next  three  sections. 

For  the  necessity  let  n  be  prime,  and  suppose  that  a  regular  77-gon  is  con¬ 
structible.  Returning  from  degrees  to  radians,  we  observe  that  each  central  angle 
is  277/77.  Thus  the  constructibility  implies  the  constructibility  of  cos  2tx  /n,  and  it 
follows  that  e171'!'1  =  cos  271/77  +  i  sin  2n/n  is  in  the  held  C  +  iC  of  constructible 
points  in  the  complex  plane.  We  have  the  factorization 

Xn  -  1  =  (X  -  1)(A"-'  +  Xn~2  H - b  X  +  1). 

and  e171' !"  is  a  root  of  the  second  factor.  The  first  example  of  Eisenstein’s  criterion 
(Corollary  8.22)  in  Section  VIII. 5  shows  that  the  second  factor  is  irreducible. 
According  to  the  results  of  Section  1,  Q(e2n '/")  is  a  simple  algebraic  extension 
of  Q  of  degree  77  —  1 . 

Applying  Theorem  9.24,  we  see  that  n  —  1  must  be  a  power  of  two.  Let  us 
write  77  —  1  =  2m.  Suppose  m  =  a2N  with  a  odd.  If  a  >  1,  then  the  equality 

r\  ^  0 •  j.L 

77  =  2  +  1  =  (2~  )“  +  I  “  exhibits  77  as  the  sum  of  two  a  powers,  necessarily 

r\N 

divisible  by  2~  +1.  Since  77  is  assumed  prime,  we  conclude  that  a  =  1.  Therefore 
77  =  2  +  1 ,  and  77  is  a  Fermat  prime. 

We  do  not  quite  succeed  in  proving  the  converse  at  this  point.  If  n  is  the  Fermat 
prime  2“  +1,  then  the  above  argument  shows  that  the  degree  of  Q(e27Tl^n)  over 
Q  is  22  .  However,  we  cannot  yet  conclude  that  Q(e27T'^n )  can  be  built  from  Q 
by  successively  adjoining  2N  square  roots,  and  thus  the  converse  part  of  Theorem 
9.24  is  not  immediately  applicable.  Once  we  have  the  theory  of  Galois  groups  in 
hand,  we  shall  see  that  the  existence  of  these  intermediate  extensions  involving 
square  roots  is  ensured,  and  then  the  constructibility  follows. 


3  Gauss  announced  both  the  necessity  and  the  sufficiency  in  this  theorem  in  his  Disquisitiones 
Arithmeticae  in  1801,  but  he  included  a  proof  of  only  the  sufficiency  (partly  in  his  articles  336  and 
365).  A  proof  of  the  necessity  appeared  in  a  paper  of  Pierre-Laurent  Wantzel  in  1837. 
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6.  Separable  Extensions 

The  Galois  group  Gal(K/k)  of  a  field  extension  K/k  is  defined  to  be  the  set 
Gal(K/k)  =  {k  automorphisms  of  K} 

with  composition  as  group  operation.  An  instance  of  this  group  was  introduced  in 
the  context  of  Example  9  of  Section  IV.  1 ;  in  this  example  the  held  k  was  the  held 
Q  of  rationals  and  the  held  K  was  a  number  held  Q[(9 1.  where  6  is  algebraic  over 
Q.  In  studying  Gal  (K/k)  in  this  chapter,  we  ordinarily  assume  that  dinr,  K  <  oo, 
but  there  will  be  instances  where  we  do  not  want  to  make  such  an  assumption. 

Beginning  in  this  section,  we  take  up  a  study  of  Galois  groups  in  general. 
We  shall  be  interested  in  relationships  between  helds  L  with  k  C  L  C  K  and 
subgroups  of  Gal  (K/k).  If  H  is  a  subgroup  of  Gal(K/k),  then 

Kh  =  {x  e  K  |  (p(x)  =  x  for  all  ip  e  H] 

is  a  held  called  the  fixed  field  of  H  \  it  provides  an  example  of  an  intermediate 
field  L  and  gives  a  hint  of  the  relationships  we  shall  investigate.  We  begin  with 
some  examples;  in  each  case  the  base  field  k  is  the  field  Q  of  rationals. 

Examples  of  Galois  groups. 

(la)  K  =  Q(VrI).  If  is  in  Gal(K/Q),  then  we  must  have  <p \ ^  =  1,  and 

(p{sT~ I )  must  be  a  root  of  X 2  +  1.  Thus  (pis/— I)  =  ±>/— 1.  Since  Q  and 
s/—  1  generate  Q(-/— 1 ),  there  are  at  most  two  such  (p’s.  On  the  other  hand, 
Q( s/~ I )  and  Q(—  sf—\ )  are  simple  extensions  of  Q  such  that  sf—\  and  —  s/~  I 
have  the  same  minimal  polynomial.  Theorem  9.1 1  therefore  produces  a  Q  auto¬ 
morphism  of  Q(s/~ T )  with  (pis/—  I  )  =  —sj—  1,  namely  complex  conjugation. 
We  conclude  that  Gal(K/Q)  has  order  2,  hence  that  Gal(K/Q)  =  C2. 

(lb)  K  =  Q(\/2).  The  same  argument  applies  as  in  Example  la,  and  the 
conclusion  is  that  Gal(K/Q)  =  C2.  The  nontrivial  element  of  the  Galois  group 
carries  s/2  to  —  s/2  and  is  different  from  complex  conjugation. 

(2)  K  =  Q(v^2).  If  (p  is  in  Gal(K/Q),  then  (p\^  =  1,  and  <p(V' 2)  has  to  be 
a  root  of  X 3  —  2.  But  K  is  a  subfield  of  M,  and  there  is  only  one  root  of  X3  —  2 
in  R.  Hence  <p(  1/2 )  =  s/2.  Since  Q  and  s/2.  generate  Q(  i/l )  as  a  field,  we  see 
that  (p  =  1.  We  conclude  that  Gal(K/Q)  has  order  1,  i.e.,  is  the  trivial  group. 

(3)  K  =  Q(r),  where  r  is  a  root  of  X3  —  X  —  |.  Any  (p  in  Gal(K/Q)  fixes  Q 
and  sends  r  to  a  root  of  X3  —  X  —  In  Example  2  of  splitting  fields  in  Section  2, 
we  saw  that  all  three  complex  roots  of  X3  —  X  —  |  lie  in  K.  Arguing  as  in 
Example  la,  we  see  that  Gal(K/Q)  has  order  3,  hence  that  Gal(K/Q)  =  C3. 
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(4)  K  =  Q(e2jTI /}1 ).  According  to  Section  5,  this  is  the  field  we  need  to 
consider  in  addressing  the  constructibility  of  a  regular  17-gon.  We  saw  in  that 
section  that  [K  :  Q]  =  16  and  that  the  minimal  polynomial  of  el7T'2n  over  Q 
isZ16  +  Z15  +  --  -  +  Z+l.  The  other  roots  of  the  minimal  polynomial  in 
C  are  e2jull11  for  2  <  l  <  16,  and  these  all  lie  in  K.  Theorem  9.11  therefore 
gives  us  a  Q  automorphism  cpi  of  K  sending  e 2jr,//17  into  e27r,,'/17  for  each  /  wjth 
1  <  /  <  16.  Since  Q  and  e27Z^xl  generate  K,  a  Q  automorphism  of  K  is 
completely  determined  by  its  effect  on  c2'T,//|7.  Thus  the  order  of  Gal(K/Q) 
is  16.  Let  us  determine  the  group  structure.  Since  <p/  sends  e27r,/’ 17  into  e2jr,;/17,  it 
sends  e2n"'l 17  =  (e2n‘^ll  Y  into  (e2;n//17)r  =  el7Tllr/11 .  If  we  drop  the  exponential 
from  the  notation,  we  can  think  of  cpi  as  defined  on  the  integers  modulo  17,  the 
formula  being  tpi(r)  =  rl  mod  17.  From  this  viewpoint  <pi  is  an  automorphism 
of  the  additive  group  of  Fn.  Lemma  4.45  shows  that  the  group  of  additive 
automorphisms  of  Fn  is  isomorphic  to  F*7,  and  it  follows  from  Corollary  4.27 
that  Gal(K/Q)  =  Cfo.  For  our  application  of  constructibility  of  a  regular  17- 
gon,  we  would  like  to  know  whether  the  elements  of  K  are  constructible.  Taking 
Theorem  9.24  into  account,  we  therefore  seek  an  intermediate  field  L  of  which 
K  is  a  quadratic  extension.  Since  we  know  that  Gal(K/Q)  is  cyclic,  we  can  let 
H  C  Gal(K/Q)  =  Ci6  be  the  2-element  subgroup,  and  it  is  natural  to  try  the 
fixed  field  L  =  KH .  To  understand  this  fixed  field,  we  need  to  understand  the 
isomorphism  F^7  =  C i6  better.  Modulo  17,  we  have 

^2  g  ^4  ^2  38  ^  Y  3 16  | 

Consequently  3  is  a  generator  of  the  cyclic  group  F^7.  Then//  =  {38,  1}  =  {±1}, 
and  L  =  {x  e  K  |  <fi-i(x)  =  <p+\(x)  =  x}.  Since  <p_ i(e27r"/17)  =  e~lllirlxl  = 
ginir/xi  w i t li  the  overbar  indicating  complex  conjugation,  we  see  that 

L  =  Kh  =  {x  e  K  |  x  =  x}. 

It  is  not  hard  to  check  that  indeed  [K  :  L]  =  2.  Next  we  need  a  subfield  L'  of 
L  with  [L  :  L']  =  2.  We  try  L'  =  KH  with  H'  equal  to  the  4-element  cyclic 
subgroup  of  Gal(K/Q).  Here  we  have  a  harder  time  checking  whether  L  is  indeed 
a  quadratic  extension  of  L',  but  we  shall  see  in  Section  8  that  it  is.4  We  continue 
in  this  way,  and  ultimately  we  end  up  with  the  chain  of  subfields  that  exhibits  the 
members  of  K  as  constructible. 

We  seek  to  formulate  the  kind  of  argument  in  the  above  examples  as  a  general 
theorem.  We  have  to  rule  out  the  bad  behavior  of  Q(  1/2 ),  where  one  root  of  the 

4Actually,  Section  8  will  point  out  how  Corollary  9.36  in  Section  7  already  handles  this  step.  In 
fact,  Corollary  9.37  handles  this  step  with  no  supplementary  argument. 
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minimal  polynomial  lies  in  the  field  but  others  do  not,  and  we  shall  do  this  by 
assuming  that  the  extension  field  is  a  “normal”  extension,  in  a  sense  to  be  defined 
in  Section  7.  In  addition,  our  style  of  argument  shows  that  we  might  run  into 
trouble  if  our  irreducible  polynomials  over  k  can  have  repeated  roots  in  K.  We 
shall  rule  out  this  bad  behavior  by  insisting  that  the  extension  be  “separable,”  a 
condition  that  we  introduce  now.  The  extension  will  automatically  be  separable 
if  K  has  characteristic  0. 

For  the  remainder  of  this  section,  fix  the  base  field  k.  An  irreducible  polynomial 
F{X)  in  k[X]  is  called  separable  if  it  splits  into  distinct  degree-one  factors  in  its 
splitting  field,  i.e.,  if 

f(X)  =  an (X  -  xi)  •  •  •  (X  -  x„)  with  xt  ^  xj  for  i  /  j. 

Once  this  splitting  into  distinct  degree-one  factors  occurs  in  the  splitting  field,  it 
occurs  in  any  larger  field  as  well. 

Lemma  9.26.  A  polynomial  F(X)  in  k[X]  has  no  repeated  roots  in  its  splitting 
field  IK  if  and  only  if  GCDt F .  F')  =  1,  where  F'(X)  is  the  derivative  of  F(X). 

PROOF.  The  polynomial  F(X)  has  repeated  roots  in  IK  if  and  only  if  F(X)  is 
divisible  by  ( X  —  r)2  for  some  r  e  IK,  if  and  only  if  some  re  K  has  F(r)  = 
F'(r)  =  0  (by  Corollary  9.17),  if  and  only  if  some  r  e  K  has  [X  —  r)  dividing 
F(X)  and  also  F'(X)  (by  the  Factor  Theorem),  if  and  only  if  some  re  IK  has 
(X  —  r)  dividing  GCD(F,  F')  when  the  GCD  is  computed  in  K,  if  and  only 
if  GCDt F.  F')  1  when  the  GCD  is  computed  in  IK  (by  unique  factorization 

in  K[X]).  However,  the  Euclidean  algorithm  calculates  GCDt F,  F')  without 
reference  to  the  field,  and  the  GCD  is  therefore  the  same  when  computed  in  K  as 
it  is  when  computed  in  k.  The  lemma  follows.  □ 

Proposition  9.27.  An  irreducible  polynomial  F(X)  in  k[X]  is  separable  if 
and  only  if  F'(X)  7^  0.  In  particular,  every  irreducible  (necessarily  nonconstant) 
polynomial  is  separable  if  k  has  characteristic  0. 

PROOF.  Since  the  polynomial  F(X)  is  irreducible  and  GCD (F,  F')  divides 
F(X),GCD(F,  F')  equals  1  or  F(X)  in  all  cases.  IfF'(X)  =  0,thenGCD(F,  F') 
=  F{X),  and  Lemma  9.26  implies  that  F(X)  is  not  separable.  Conversely 
if  F'(X)  /  0,  then  the  facts  that  GCDt/7.  F')  divides  F'(X)  and  that  deg  F'  < 
deg  F  together  imply  that  GCD(F,  F')  cannot  equal  F(X).  So  GCDt  F.  F')  =  1, 
and  Lemma  9.26  implies  that  F(X)  is  separable.  □ 

Lix  an  algebraic  extension  K  of  k.  We  say  that  an  element  x  of  IK  is  separable 
over  k  if  the  minimal  polynomial  of  x  over  k  is  separable.  We  say  that  IK  is  a 
separable  extension  of  k  if  every  x  in  IK  is  separable  over  k. 
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Examples  of  separable  extensions  and  extensions  not  separable. 

(1)  In  characteristic  0,  every  algebraic  extension  K  of  k  is  separable,  by 
Proposition  9.27 . 

(2)  Every  algebraic  extension  K  of  a  finite  field  k  is  separable.  In  fact,  if  x  is 
in  K,  then  [k(x)  :  k]  is  finite.  Hence  k(jt)  is  a  finite  field.  Then  we  may  assume 
that  K  is  a  finite  field,  say  of  order  q  =  pn  with  p  prime.  Since  the  multiplicative 
group  Kx  has  order  q  —  1,  every  nonzero  element  of  K  is  a  root  of  Xq~x  —  1,  and 
every  element  of  IK  is  therefore  a  root  of  Xq  —  X.  The  minimal  polynomial  FIX) 
of  x  over  k  must  then  divide  Xq  —  X.  However,  we  know  that  Xq  —  X  splits  over 
K  and  has  no  repeated  roots.  Thus  FIX)  splits  over  IK  and  has  no  repeated  roots. 
Then  FIX)  is  separable  over  k,  and  x  is  separable  over  k. 

(3)  Let  k  =  Fp(x)  be  a  transcendental  extension  of  the  finite  field  F;).  Because 
this  extension  is  transcendental,  Xp  —  x  is  irreducible  over  k.  Let  IK  be  the 
simple  algebraic  extension  ]k[X]/(Xp  —  x),  which  we  can  write  more  simply  as 
k(x1/,p).  The  minimal  polynomial  of  xl/p  over  k  is  Xp  —  x,  and  its  derivative  is 
pXp~x  =  0  since  the  derivative  of  the  constant  x  is  0.  By  Proposition  9.27,  x^p 
is  not  separable  over  k. 

The  way  that  separability  enters  considerations  with  Galois  groups  is  through 
the  following  theorem,  explicitly  or  implicitly.  One  of  the  corollaries  of  the 
theorem  is  that  if  K/k  is  an  algebraic  extension,  then  the  set  of  elements  in  IK 
separable  over  k  is  a  subfield  of  K. 

Theorem  9.28.  Let  k  c  L  c  IK  be  an  inclusion  of  fields  such  that  K  is  a 
simple  algebraic  extension  of  L  of  the  form  K  =  L(a),  let  K  be  an  algebraic 
closure  of  K,  and  let  MIX)  be  the  minimal  polynomial  of  a  over  L.  Then  the 
number  of  field  mappings  of  IK  into  IK  fixing  k  is  the  product  of  the  number  of 
distinct  roots  of  M(X)  in  IK  by  the  number  of  field  mappings  of  L  into  IK  fixing  k. 

Remarks.  An  algebraic  closure  IK  of  IK  exists  by  Theorem  9.22.  Because  IK 
is  known  to  exist,  the  present  theorem  reduces  to  Theorem  9. 1 1  when  L  =  k. 

PROOF.  Any  field  mapping  tp  :  IK  — >■  K  is  uniquely  determined  by  ,  and 
) p{a ).  If  a  =  <p|L,  then  the  equality  M (a)  =  0  implies  that  Ma  (<p(a))  =  0,  and 
thus  (p{a)  has  to  be  a  root  of  Mn  !X).  The  number  of  distinct  roots  of  Ma (X) 
in  IK  equals  the  number  of  distinct  roots  of  M  ( X )  in  K;  hence  the  number  of 
possibilities  for  tp(a)  is  at  most  the  number  of  distinct  roots  of  MIX)  in  IK. 
Consequently  the  number  of  such  <p' s  fixing  k  is  bounded  above  by  the  product 
of  the  number  of  distinct  roots  of  M(X)  in  IK  times  the  number  of  field  mappings 
a  of  L  into  IK  fixing  k. 

Lor  an  inequality  in  the  reverse  direction,  let  a  :  L  — >■  IK  be  any  field  mapping 
of  L  into  K  fixing  k,  put  L'  =  er(L),  let  x  be  any  root  of  Ma{X),  and  form  the 
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subfield  L'(x)  of  K.  Theorem  9.1  T  shows  that  there  exists  a  field  isomorphism 
<p  :  L(a)  — >  h'(x)  with  (p\  =  a  and  /p(a )  =  x,  and  we  can  regard  <p  as  a 

field  mapping  of  K  into  K  fixing  k,  extending  cr,  and  having  cp{a)  =  x.  Thus 
the  number  of  field  mappings  <p  :  K  — »■  k  fixing  k  is  bounded  below  by  the 
product  of  the  number  of  distinct  roots  of  M(X)  in  IK  times  the  number  of  field 
homomorphisms  cr  of  L  into  K  fixing  k.  □ 

Corollary  9.29.  Let  K  =  k(«i,  . . . ,  a„)  be  a  finite  algebraic  extension  of 
the  field  k,  and  let  IK  be  an  algebraic  closure  of  K.  Then  the  number  of  field 
mappings  of  K  into  K  fixing  k  is  <  [K  :  k].  Moreover,  the  following  conditions 
are  equivalent: 

(a)  the  number  of  field  mappings  of  IK  into  IK  fixing  k  equals  [K  :  k], 

(b)  each  c/j  is  separable  over  k(cY| , . . . ,  otj-i)  for  1  <  j  <  n, 

(c)  each  c/j  is  separable  over  k  for  '<./<  n. 

PROOF.  The  minimal  polynomial  of  a j  over  k(ai ,  . . . ,  otj- 1)  divides  the  min¬ 
imal  polynomial  of  o/j  over  k.  If  the  second  of  these  polynomials  has  distinct 
roots  in  its  splitting  field,  so  does  the  first.  Thus  (c)  implies  (b). 

For  1  <  j  <  n.  let  the  minimal  polynomial  of  c/j  over  k(ati, . . . ,  a/_i)  be 
Mj{X),  let  dj  be  the  degree  of  Mj(X),  and  let  Sj  be  the  number  of  distinct  roots 
of  Mj(X )  in  K.  Then  Sj  <  dj  with  equality  for  a  particular  j  if  and  only  if  aj 
is  separable  over  k(aq, . . . ,  c/j-i),  by  definition.  Also,  [K  :  k]  =  n;=i  ^  by 
Corollary  9.7,  and  the  number  of  field  mappings  of  IK  into  IK  fixing  k  is  ]~["=1  Sj 
by  iterated  application  of  Theorem  9.28.  From  these  facts,  the  first  conclusion  of 
the  corollary  is  immediate,  and  so  is  the  equivalence  of  (a)  and  (b). 

Condition  (a)  is  independent  of  the  order  of  enumeration  of  a.\ ,  . . . ,  o/„.  Since 
we  can  always  take  any  particular  aj  to  be  first,  we  see  that  (a)  implies  (c).  □ 

Corollary  9.30.  Let  K  =  k(tf  i , . . . ,  an)  be  a  finite  algebraic  extension  of  the 
field  k.  If  each  a7-  for  1  <  j  <  n  is  separable  over  k,  then  K/k  is  a  separable 
extension. 

PROOF.  Let  ft  be  in  K,  We  apply  the  equivalence  of  (a)  and  (c)  in  Corollary 
9.29  once  to  the  set  of  generators  {ai, . . . ,  a,,}  and  once  to  the  set  of  generators 
{/6,  , . . . ,  a„},  and  the  result  is  immediate.  □ 

Corollary  9.31.  If  K/k  is  an  algebraic  field  extension,  then  the  subset  L  of 
elements  of  K  that  are  separable  over  k  is  a  subfield  of  K. 

PROOF.  If  a  and  ft  are  given  in  L,  we  apply  Corollary  9.30  to  the  extension 
k(a,  ft)  of  k  to  see  that  L  contains  the  subfield  generated  by  k  and  the  elements 
a  and  ft.  □ 
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Proposition  9.32.  If  K/k  is  a  separable  algebraic  extension  and  if  L  is  a  field 
with  k  c  L  c  K,  then  K  is  separable  over  L,  and  L  is  separable  over  k. 

PROOF.  The  separability  assertion  about  L/k  says  the  same  thing  about  el¬ 
ements  of  L  that  separability  of  K/k  says  about  those  same  elements,  and  it  is 
therefore  immediate  that  L/k  is  separable. 

Next  let  us  consider  K/L.  If  x  is  in  K,  let  F(X)  be  its  minimal  polynomial 
over  k,  and  let  G(X )  be  its  minimal  polynomial  over  L.  Since  F(X)  is  in  L[X] 
and  F(x)  =  G(x)  =  0,  G(X)  divides  F(X).  Since  K/k  is  separable,  F(X) 
splits  into  distinct  degree-one  factors  in  its  splitting  field  IF.  The  field  F  contains 
the  splitting  field  of  G(X),  and  thus  the  degree-one  factors  of  G(X)  in  F|  X  ]  are  a 
subset  of  the  degree-one  factors  of  F(X)  in  F[X].  There  are  no  repeated  factors 
for  F(X),  and  there  can  be  no  repeated  factors  for  G(X).  Thus  x  is  separable 
over  L,  and  K/L  is  a  separable  extension.  □ 

In  studying  Galois  groups,  we  shall  be  chiefly  interested  in  the  following 
situation  in  Corollary  9.29:  K  is  an  algebraic  field  extension  K  =  k(ai , a „) 
of  k  for  which  every  field  mapping  of  K  into  an  algebraic  closure  that  fixes  k 
actually  carries  K  into  itself.  We  seek  conditions  under  which  this  situation  arises, 
and  then  we  mine  the  consequences.  As  we  did  in  the  study  begun  in  Theorem 
9.28,  we  begin  with  the  case  of  a  simple  algebraic  extension. 

Let  K  =  k(y)  be  a  simple  algebraic  extension  of  k,  and  let  F(X)  be  the 
minimal  polynomial  of  y  over  k.  Any  member  <p  of  the  Galois  group  Gal  (K/k) 
carries  y  to  another  root  y'  of  F(X),  and  cp  is  uniquely  determined  by  y'  since 
k  and  y  generate  the  field  K.  An  element  (p  of  Gal(K/k)  carrying  y  to  y'  can 
exist  only  if  y'  is  in  K.  If  y'  is  in  K,  then  k(y)  2  k(y'X  and  the  equal  finite 
dimensionality  of  k(y)  and  k(y')  forces  k(y)  =  k(j/)-  In  other  words,  if  y'  is 
in  K,  then  the  unique  k  isomorphism  k(y)  — >  k(y')  of  Theorems  9.10  and  9.1 1 
carrying  y  to  y'  is  a  member  of  Gal  (K/k).  Making  a  count  of  what  happens  to 
all  the  elements  y' ,  we  see  that  we  have  proved  the  following. 

Proposition  9.33.  Let  K  =  k(y)  be  a  simple  algebraic  extension  of  k,  and  let 
F(X)  be  the  minimal  polynomial  of  y.  Then 

|  Gal  (K/k)  |  <  [K  :  k] 

with  equality  if  and  only  if  F(X)  is  a  separable  polynomial  and  K  is  the  splitting 
field  of  F(X)  over  k. 

Example.  For  K  =  Q(\/2)  with  minimal  polynomial  F(X).  we  know  that 
F(X)  does  not  split  in  K;  the  nonreal  roots  of  F(X)  do  not  lie  in  K.  Proposition 
9.33givesus  |  Gal(K/Q)|  <  [K  :  Q]  =  3,  and  a  glance  at  the  argument  preceding 
Proposition  9.33  shows  that  |  Gal(K/Q)|  has  to  be  1. 
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It  is  possible  to  investigate  the  case  of  several  generators  directly,  but  it  is  more 
illuminating  to  reduce  it  to  the  case  of  a  single  generator  as  in  Proposition  9.33. 
The  tool  for  doing  so  is  the  following  important  theorem. 


Theorem  9.34  (Theorem  of  the  Primitive  Element).  Let  K/k  be  a  separable 
algebraic  extension  with  [K  :  k]  <  oo.  Then  there  exists  an  element  y  in  K  such 
that  K  =  k(y). 

PROOF.  We  may  assume  that  k  is  infinite  because  Corollary  4.27  shows  that 
the  multiplicative  group  of  a  finite  field  is  cyclic.  With  k  infinite,  we  can  write 
K  =  k(xi , . . . ,  xn ),  and  we  proceed  by  induction  on  n.  the  case  n  =  1  being 
trivial.  For  general  n,  let  L  =  k(jci, . . . ,  x„_i),  so  that  IK  =  L(x„).  By  the 
inductive  hypothesis,  L  is  of  the  form  L  =  k(a)  for  some  a  in  K,  and  thus 
K  =  k(a,  xn).  Changing  notation,  we  see  that  it  is  enough  to  prove  that  whenever 
IK  is  a  separable  algebraic  extension  of  the  form  K  =  k  (a ,  ft),  then  K  is  of  the 
form  K  =  k(y)  for  some  y.  We  shall  show  this  for  y  of  the  form  y  =  ft  +  ca 
for  some  c  in  k. 

Let  F(X )  and  G(X)  be  the  minimal  polynomials  of  a  and  ft  over  k,  and  let 
IK'  be  an  extension  in  which  F(X)G(X )  splits,  i.e.,  in  which  F(X)  and  G(X) 
both  split.  Let  o';  =  a,  a.2 , . . . ,  a,„  and  ft\  =  ft,  fc,  ■  ■  ■ ,  fin  be  the  roots  of 
F(X)  and  G  (  X )  in  K\  In  each  case  the  roots  are  necessarily  distinct  by  definition 
of  separability  of  a  and  fi.  Define  L  =  k(y)  with  y  =  fi  +  ca,  where  c  is  a 
member  of  k  yet  to  be  specified.  For  suitable  c,  we  shall  show  that  a  is  in  L. 
Then  ft  =  y  —  ca  must  be  in  L,  and  we  obtain  K  C  L.  Since  y  is  in  K,  the 
reverse  inclusion  is  built  into  the  construction,  and  thus  we  will  have  K  =  L. 

We  shall  compute  the  minimal  polynomial  of  a  over  L.  We  know  that  a  is  a 
root  of  F (X),  and  we  put  H(X)  =  G(y-cX).  Then  H(X)  is  inL[X]  C  K'[X], 
and  G(fi)  =  0  implies  H(a)  =  0.  Therefore  X  —a  divides  both  F(X)  and  H(X) 
in  the  ring  K'[X],  Let  us  determine  GCD(  F.  H )  in  K'[X],  The  separability  of 
a  says  that  X  —  a  divides  F(X)  only  once.  Since  F (X )  splits  in  K'[x],  any 
other  prime  divisor  of  GCD(F,  H)  in  K' [  X  \  has  to  be  of  the  form  X  —  a,  with 
i  ^  1.  The  definition  of  H(X )  gives  //(a,)  =  G(y  —  ca,).  If  G(y  —  ca,)  =  0, 
then  y  —  ca,-  =  f)j  for  some  j,  with  the  consequence  that  ft  +  ca  —  ca,-  =  ftj 
and  c  =  (ftj  —  ft)(a  —  a,)-1.  Since  k  is  an  infinite  field,  we  can  choose  c  in 
IK  different  from  all  the  finitely  many  quotients  (ftj  —  ft) (a  —  a,-)-1.  For  such  a 
choice  of  c,  GCD(F,  H)  =  X  -  a  in  K'[X].  Then  GCD(F,  H)  =  X  -  a,  up  to 
a  scalar  factor,  in  L[X]  since  F(X)  and  H  (X)  are  in  L[X]  and  since  the  GCD 
can  be  computed  without  reference  to  the  field  containing  both  elements.  The 
ratio  of  the  constant  term  to  the  coefficient  of  X  has  to  be  in  L  independently  of 
the  scalar  factor  multiplying  X  —  a,  and  therefore  a  is  in  L.  This  completes  the 
proof.  □ 
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In  using  Galois  groups  to  help  in  understanding  field  extensions,  an  example  to 
keep  in  mind  is  the  extension  Q(  \/2)/Q.  In  this  case  the  Galois  group  is  trivial 
and  therefore  gives  us  no  information  about  the  extension.  Thus  it  makes  sense 
to  regard  the  failure  of  equality  to  hold  in  an  inequality  |  Gal(K/k)|  <  [K  :  k]  as 
an  undesirable  situation.5 

Proposition  9.33  suggests  that  the  failure  of  equality  to  hold  in  the  inequality 
|  Gal(K/k)|  <  [K  :  k]  has  something  to  do  with  two  phenomena.  One  is  the 
possible  failure  of  some  polynomials  over  k  to  be  separable,  and  the  other  is  the 
failure  of  polynomials  over  k  to  split  fully  in  K  once  they  have  at  least  one  root 
in  K.  Having  examined  separability  in  Section  6,  we  turn  to  this  question  of  full 
splitting  of  polynomials. 

Accordingly,  we  make  a  definition,  choosing  among  several  equivalent  condi¬ 
tions  the  one  that  is  usually  the  easiest  to  check  in  practice.  A  finite6  algebraic 
extension  K  of  a  held  k  is  said  to  be  normal  over  k  if  K  is  the  splitting  held  of  some 
F(X)  in  k(  X\.  The  following  proposition  gives  some  equivalent  formulations  of 
this  condition. 

Proposition  9.34A.  Let  K  be  a  hnite  algebraic  extension  of  a  held  k,  and  regard 
K  as  contained  in  a  hxed  algebraic  closure  K.  Then  the  following  conditions  on 
K  are  equivalent. 

(a)  K  is  the  splitting  held  of  some  F(X)  in  k[X],  i.e.,  K  is  normal  over  k, 

(b)  every  irreducible  polynomial  M ( X)  in  k[X]  with  a  root  in  IK  splits  in  K, 
i.e.,  IK  contains  the  splitting  held  for  each  such  M(X), 

(c)  every  k  isomorphism  of  K  into  IK  carries  IK  into  itself. 

Remark.  Although  (a)  is  often  the  easiest  of  the  conditions  to  check,  (b)  is 
often  the  easiest  to  disprove.  It  is  therefore  quite  handy  to  know  the  equivalence. 

Proof.  Suppose  that  (a)  holds.  Let  F(X)  be  as  in  (a),  and  let  its  roots  be 
y i , . . . ,  y„.  Let  M(X)  be  an  irreducible  polynomial  in  k|  X ]  with  a  root  a  in  IK, 
and  let  L  be  the  splitting  held  of  M{X)  over  K.  Let  /3  be  any  root  of  M(X)  in 
L.  Since  M(X )  is  irreducible  over  k,  Theorem  9.1 1  produces  a  k  isomorphism 
o  of  k(a)  onto  k(/3)  with  ct{a)  =  /3.  The  isomorphism  o  leaves  F(X)  hxed, 
since  the  coefficients  of  F(X)  are  in  k.  Now  the  splitting  held  of  F(X)  over  k(a) 

5We  obtained  this  inequality  in  Proposition  9.33  only  when  IK  has  a  single  generator  over  k,  but 
we  take  this  case  as  indicative  of  what  to  expect  more  generally. 

6Many  books  do  not  restrict  the  definition  to  finite  extensions.  The  additional  generality  of 
infinite  algebraic  extensions  will  not  be  of  benefit  for  our  current  purposes,  and  thus  we  restrict  to 
finite  extensions  for  now.  But  in  Section  VII.6  of  Advanced  Algebra ,  we  shall  enlarge  the  definition 
of  "normal”  to  allow  infinite  algebraic  extensions. 
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is  K,  since  the  roots  of  F(X)  are  in  K  and  generate  K  over  k(ot).  Similarly  the 
splitting  held  of  F(X)  over  k(/3)  is  K (fi).  Application  of  Theorem  9.13'  yields 
a  held  isomorphism  cp  of  K  onto  K(/l)  such  that  <p\-,(a)  =  cr  and  such  that  <p 
carries  the  roots  of  F(X)  to  the  roots  of  F(X).  We  can  express  a  as  a  rational 
expression  in  yi,  ■  •  • ,  Yn  with  coefficients  in  k,  and  then  ft  =  (p(a)  is  the  same 
rational  expression  in  <p(yi), . . . ,  <p ( y„ ) ,  which  themselves  are  members  of  K. 
Therefore  /?  is  in  K,  and  the  conclusion  is  that  M(X)  splits  in  K. 

Suppose  that  (b)  holds.  Let  <p  be  a  k  isomorphism  of  IK  into  K,  and  let  a  be 
any  element  of  K.  The  minimal  polynomial  M(X)  of  a.  over  k  is  irreducible  and 
has  a  as  a  root  in  K.  By  (b),  M(X)  splits  in  K.  The  element  (p(a)  has  to  be  a 
root  of  M(X )  since  <p  hxes  the  coefficients  of  M(X'),  and  all  the  roots  of  M(X) 
are  assumed  to  lie  in  K.  Therefore  cp(a)  lies  in  K,  and  (b)  implies  (c). 

Suppose  that  (c)  holds.  Since  IK  is  a  finite  algebraic  extension  of  k,  we  can 
write  IK  =  k(oq, . . . ,  a„ )  for  suitable  elements  a\, ...  ,a„  of  K.  Let  P/(X)  be 
the  minimal  polynomial  of  aj  over  k,  and  put  F(X)  =  ]~ [”_i  Pj(X).  Since  the 
roots  ai, ...  ,an  generate  IK  over  k,  it  is  enough  to  show  that  every  root  of  F{X) 
lies  in  K,  i.e.,  each  root  of  each  Pj(X)  lies  in  K.  Let  be  a  root  of  Pj{X)  in  IK. 
We  know  from  Theorem  9.1 1  that  there  is  a  k  isomorphism  <p  of  k (cry)  onto  k(/3) 
with  (p(dj)  =  ft.  Theorem  9.23  shows  that  <p  extends  to  a  held  mapping  of  IK  into 
K,  and  (c)  shows  that  the  extended  < p  sends  K  into  itself.  Therefore  /3  =  <p  (aj ) 
lies  in  K,  and  all  the  roots  of  F(X)  in  IK  lie  in  IK.  Thus  (c)  implies  (a).  □ 

Now  we  can  put  together  the  properties  of  normal  and  separable  extensions. 
It  will  be  convenient  to  be  able  to  refer  in  this  context  to  the  equivalence  of  (a) 
and  (b)  that  was  proved  in  Proposition  9.34A,  and  thus  we  repeat  the  statement 
of  that  equivalence  here. 

Proposition  9.35.  Let  IK  be  a  finite  separable  algebraic  extension  of  a  held  k, 
so  that  |  Gal(K/k)|  <  [K  :  k].  Then  the  following  are  equivalent. 

(a)  K  is  the  splitting  held  of  some  F(X)  in  k[X],  i.e.,  IK  is  normal  over  k, 

(b)  every  irreducible  polynomial  M(X)  in  k[  A  ]  with  a  root  in  IK  splits  in  K, 
i.e.,  IK  contains  the  splitting  held  for  each  such  M(X), 

(c)  |  Gal(K/k)|  =  [K  :  k], 

(d)  k  =  Kg  for  G  =  Gal(K/k). 

Remarks.  The  equivalence  of  (a)  and  (b)  is  part  of  Proposition  9.34A,  and 
the  fact  that  they  are  equivalent  with  (c)  follows  from  Proposition  9.33  and  the 
Theorem  of  the  Primitive  Element  (Theorem  9.34).  We  prove  that  the  equivalent 
(a),  (b),  and  (c)  imply  (d),  and  that  (d)  implies  (b). 

PROOF.  Suppose  that  the  equivalent  (a),  (b),  and  (c)  hold  for  K/k.  We  prove 
(d).  Write  G  =  Gal(K/k),  and  let  k'  =  IKC.  Since  every  member  of  Gal(K/k) 
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fixes  k',  Gal(K/k)  c  Gal(K/k').  Meanwhile,  (a)  for  K/k  implies  (a)  for  K/k', 
and  IK  is  separable  over  k'  by  Proposition  9.32.  Since  (a)  implies  (c),  (c)  holds 
for  both  k'  and  k,  and  we  have 

[K  :  k]  =  |  Gal  (K/k)  |  <  |  Gal(K/k')|  =  [K  :  k']. 

Since  k'  3  k,  the  inequality  of  dimensions  implies  that  k'  =  k.  Thus  (d)  holds. 

Suppose  (d)  holds.  We  prove  (b).  Let  M(X )  be  an  irreducible  polynomial 
in  k[X]  having  a  root  r  in  K.  The  polynomial  M ( X)  is  necessarily  the  minimal 
polynomial  of  r  over  k.  Define 

J(X)  =  n  (X  -  <p(r)).  (*) 

<peG 

If  ifo  is  in  G,  then  F,<J"  is  given  by  replacing  each  <p(x)  by  cpocpir),  and  the  product 
is  unchanged.  Therefore  J(X)  =  JV°(X),  and  J (X)  is  in  KG[X].  From  the 
assumption  in  (d),  KG  =  k.  Therefore  J(X)  is  in  k[X].  Since  J{r)  =  0  and 
since  M(X)  is  the  minimal  polynomial  of  r  over  k,  M(X)  divides  J (X).  Over 
K,  J(X)  splits  because  of  its  definition  in  (*).  By  unique  factorization  in  K[X], 
M(X)  must  split  too.  Thus  M(X)  splits  in  K[X],  and  (b)  holds.  □ 

Corollary  9.36.  If  K  is  a  finite  normal  separable  extension  of  k  and  if  L  is  a 
field  with  k  c  L  c  K,  then  K  is  a  finite  normal  separable  extension  of  L,  and  the 
subgroup  H  =  Gal(K/L)  of  Gal(K/k)  has 


\H\  ■  [L  :  k]  =  |  Gal(K/k)|  . 


PROOF.  The  field  K  is  a  separable  extension  of  the  intermediate  field  L  by 
Proposition  9.32,  and  it  is  a  normal  extension  by  Proposition  9.35a.  Therefore 
Proposition  9.35c  gives  |  Gal(K/L)|  =  [K  :  L],  and  we  have 

|//|  •  [L  :  k]  =  |  Gal(K/L)|  •  [L  :  k]  =  [K  :  L]-[L  :  k]  =  [K  :  k]  =  |Gal(K/k)|, 

the  last  two  equalities  holding  by  Corollary  9.7  and  Proposition  9.35c.  □ 

Corollary  9.37.  Let  K/k  be  a  separable  algebraic  extension,  and  suppose  that 
H  is  a  finite  subgroup  of  Gal(K/k).  Then  K/KH  is  a  finite  normal  separable 
extension,  H  is  the  subgroup  Gal(K/Kw)  of  Gal(K/k),  and  [K  :  K^]  =  \H\. 

PROOF.  Proposition  9.32  shows  that  K  is  separable  over  KH .  For  an  arbitrary 
element  x  of  K,  form  the  polynomial  in  K[X]  given  by 

F(X)=  [1  (X-cp(x)). 

<peH 
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If  <po  is  in  H ,  then  Fv"  is  given  by  replacing  each  cp(x)  by  (pit(p(x),  and  the  product 
is  unchanged.  Therefore  F(X )  =  F^iX),  and  F(X)  is  in  KW[X].  Thus  F(X) 
is  a  polynomial  in  WJ1  [X]  that  has  x  as  a  root  and  splits  in  K.  The  minimal 
polynomial  M(X)  of  x  over  WLH  must  divide  F(X),  and  it  too  has  x  as  a  root. 
By  unique  factorization  in  K[X],  M(X)  must  split  in  K.  Thus  K/Kw  will  be  a 
normal  extension  if  it  is  shown  that  [K  :  ]  <  oo. 

The  element  x  has  [KH(x)  :  KH]  =  deg  M{X)  <  deg  F(X)  =  \FI\,  and 
the  claim  is  that  [K  :  WJ1 ]  <  \H\.  Assuming  the  contrary,  we  would  at 
some  point  have  an  inequality  \Y.U  (x\ , . . .  ,xn)  :  Y.H  ]  >  H  because  every 
element  of  K  is  algebraic  over  k.  By  the  Theorem  of  the  Primitive  Element 
(Theorem  9.34),  Y.H  (xi .  . . . ,  xn)  =  K H(z)  for  some  element  z,  and  therefore 
[Kh(xj  ,  . . . ,  xn)  :  Kw]  =  [Kw(z)  :  Kw]  <  \H\,  contradiction.  We  conclude 
that  [K  :  KH]  <  \H  \.  From  the  previous  paragraph,  IK/K^  is  a  finite  separable 
normal  extension. 

The  definition  of  KH  shows  that  H  c  GalfK/K^),  and  Proposition  9.35c 
gives  |  GalfK/K^)!  =  [K  :  Kff],  Putting  these  facts  together  with  the  inequality 
[K  :  Y.11  \  <  \  H\  from  the  previous  paragraph,  we  have 

\H\  <  |  Gal(K/Kw)|  =  [K  :  Kw]  <  \H\ 

with  equality  on  the  left  only  if  H  =  Gal(K/K//  ).  Equality  must  hold  throughout 
the  displayed  line  since  the  ends  are  equal,  and  therefore  FI  =  Gal(K/Kw).  □ 


8.  Fundamental  Theorem  of  Galois  Theory 

We  are  now  in  a  position  to  obtain  the  main  result  in  Galois  theory. 

Theorem  9.38  (Fundamental  Theorem  of  Galois  Theory).  If  K  is  a  finite 
normal  separable  extension  of  k,  then  there  is  a  one-one  inclusion-reversing 
correspondence  between  the  subgroups  H  of  Gal(K/k)  and  the  subfields  L  of  K 
that  contain  k,  corresponding  elements  H  and  L  being  given  by 

L  =  K.h  and  H  =  Gal(K/L). 

The  effect  of  the  theorem  is  to  take  an  extremely  difficult  problem,  namely 
finding  intermediate  fields,  and  reduce  it  to  a  problem  that  is  merely  difficult, 
namely  finding  the  Galois  group.  For  example  the  finiteness  of  Gal(K/k)  implies 
that  there  are  only  finitely  many  subgroups  of  Gal(K/k) ,  and  the  theorem  therefore 
implies  that  there  are  only  finitely  many  intermediate  fields;  this  finiteness  of  the 
number  of  intermediate  fields  is  not  so  obvious  without  the  theorem. 


8.  Fundamental  Theorem  of  Galois  Theory 


485 


As  a  reminder  of  the  availability  of  Theorem  9.38,  Proposition  9.35,  and 
Corollary  9.36,  it  is  customary  to  refer  to  a  finite  normal  separable  extension 

as  a  finite  Galois  extension. 

Before  coming  to  the  proof  of  the  theorem,  let  us  examine  what  the  theorem 
says  for  the  examples  in  Section  6.  In  each  case  the  field  k  is  the  field  Q  of 
rationals.  The  extensions  are  separable  because  the  characteristic  is  0. 

Examples. 

(la)  IK  =  Q(V— T ).  This  is  the  splitting  field7  for  X2  +  1.  Proposition 
9.33  gives  |  Gal(K/Q)|  =  [K  :  Q]  =  2.  Thus  Gal(K/Q)  =  C2.  There  are  no 
nontrivial  subgroups,  and  there  are  consequently  no  intermediate  fields.  We  knew 
this  already  since  there  cannot  be  any  intermediate  Q  vector  spaces  between  Q 
and  IK.  Thus  the  theorem  tells  us  nothing  new. 

(lb)  IK  =  Q(V2 ).  Similar  remarks  apply. 

(2)  IK  =  Q(y/2).  This  extension  is  not  normal,  as  a  consequence  of  (b) 
in  Proposition  9.34A.  (Namely  X3  —  2  has  a  root  in  K  but  does  not  split  in  K.) 
Theorem  9.38  does  not  apply  to  K.  If  we  ad  join  r  to  K  with  r2  +  (  1/2  )r+(  s/l  )2  = 
0,  we  obtain  the  splitting  field  IK'  for  X3  —  2  over  Q.  Then  IK'  is  a  normal 
extension  of  Q,  and  the  theorem  applies.  Since  each  element  of  Gal(K'/Q) 
permutes  the  three  roots  of  X3  —  2  and  is  determined  by  its  effect  on  these  roots, 
Gal(K'/Q)  is  isomorphic  to  a  subgroup  of  the  symmetric  group  ©3.  The  Galois 
group  Gal(K'/Q)  has  order  [K'  :  Q]  =  6  and  hence  is  isomorphic  to  the  whole 
symmetric  group  S3.  The  group  S3  has  three  subgroups  of  order  2  and  one 
subgroup  of  order  3.  Therefore  K'  has  three  intermediate  fields  of  degree  3  and 
one  of  degree  2.  The  intermediate  fields  of  degree  3  are  the  three  fields  generated 
by  Q  and  one  of  the  three  roots  of  X3  —  2.  The  intermediate  field  of  degree  2 
corresponds  to  the  alternating  subgroup  of  order  3  and  is  the  subfield  generated 
by  Q  and  the  cube  roots  of  1 .  It  is  the  splitting  field  for  X2  +  X  +  1  over  Q. 

(3)  IK  =  Q(r),  where  r  is  a  root  of  X3  —  X  —  |.  We  know  from  Section  2 
that  X3  —  X  —  |  is  irreducible  over  Q  and  splits  in  IK,  and  IK  by  definition  is 
therefore  normal.  Proposition  9.33  tells  us  that  Gal(K/Q)  has  order  3  and  hence 
is  isomorphic  to  C3.  There  are  no  nontrivial  subgroups,  and  Theorem  9.38  tells 
us  that  there  are  no  intermediate  fields.  We  could  have  seen  in  more  elementary 
fashion  that  there  are  no  intermediate  fields  by  using  Corollary  9.7,  since  the 
corollary  tells  us  that  the  degree  of  an  intermediate  field  would  have  to  divide  3. 

(4)  K  =  Q(e2jr1/17).  We  have  seen  that  [K  :  Q]  =  16  and  that  Gal(K/Q)  = 
F*7  =  C i6-  Let  c  be  a  generator  of  the  cyclic  Galois  group.  Let  Hi  =  {1,  c8}, 

7It  is  customary  to  regard  the  algebraic  closure  of  Q  as  a  subfield  of  C,  and  thus  there  is  no 
ambiguity  in  referring  to  the  splitting  field. 
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Hy  =  {1,  c4,  c8,  c12},  and  H%  =  {1,  c2,  c4,  c6,  c8,  c10,  c12,  c14}.  Then  put 

L2  =  KH\  l4  =  k"4,  l8  =  k"8. 

The  inclusions  among  our  subgroups  are 

{1}  QH2QH4QHSQ  Gal(K/Q), 

and  the  theorem  says  that  the  correspondence  with  intermediate  fields  reverses 
inclusions.  Then  we  have 


K  2  L2  2  L4  2  L8  2  Q. 


Applying  Corollary  9.36,  we  see  that  each  of  these  subfields  is  a  quadratic  ex¬ 
tension  of  the  next-smaller  one.  Theorem  9.24  says  that  the  members  of  K  are 
therefore  constructible  with  straightedge  and  compass.  Consequently  a  regular 
17-gon  is  constructible  with  straightedge  and  compass.  The  constructibility  or 
nonconstructibility  of  regular  n  -gons  for  general  n  will  be  settled  in  similar  fashion 
in  the  next  section.  In  Section  12  we  return  to  the  question  of  using  Galois  theory 
to  guide  us  through  the  actual  steps  of  the  construction  when  it  is  possible. 

Proof  of  Theorem  9.38.  The  function  L  Gal(K/L)  has  domain  the 
set  of  all  intermediate  fields  and  range  the  set  of  all  subgroups  of  Gal(K/k), 
since  an  element  in  Gal(K/L)  is  necessarily  in  Gal(K/k).  Each  such  exten¬ 
sion  K/L  is  separable  by  Proposition  9.32  and  is  normal  by  Proposition  9.34A. 
Thus  Proposition  9.35d  applies  to  each  K/L  and  shows  that  L  =  KGahK/L). 
Consequently  the  function  L  h*-  Gal  (K/L)  is  one-one.  If  H  is  a  subgroup  of 
Gal(K/k),  then  Corollary  9.37  shows  that  L  =  Kw  is  an  intermediate  field  for 
which  H  =  Gal  (K/L),  and  therefore  the  function  L  i->  Gal  (K/L)  is  onto. 

It  is  immediate  from  the  definition  of  Galois  group  that  Li  C  L2  implies 
Gal(K/Li)  2  Gal(K/L2),  and  it  is  immediate  from  the  formula  L  =  KGal(K/L) 
that  Gal(K/Li)  2  Gal(K/L2)  implies  Li  C  L2.  This  completes  the  proof.  □ 

Corollary  9.39.  If  K  is  a  finite  Galois  extension  of  k  and  if  L  is  a  subheld  of 
K  that  contains  k,  then  L  is  a  normal  extension  of  k  if  and  only  if  Gal(K/L)  is 
a  normal  subgroup  of  Gal(K/k).  In  this  case,  the  map  Gal(K/k)  — »■  Gal(L/k) 
given  by  restriction  from  K  to  L  is  a  group  homomorphism  that  descends  to  a 
group  isomorphism 


Gal(K/k)  /  Gal(K/L)  =  Gal(L/k). 
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Proof.  Let  L  correspond  toH  =  Gal(K/L)  in  Theorem  9.38,  sothatL  =  ¥LH . 
If  <p  is  in  Gal(K/k),  then 

K<PH<P~'  =  (k  G  K  |  {k)  =  k  for  aU  h  e  H} 

=  {( p(k' )  e  K  |  (ph(k')  =  (p(k')  for  all  h  e  H} 

=  {(p(k')  e  K  |  h(k')  =  k!  for  all  h  e  H } 

=  <p(Kw)  =  <P(L)- 


Since  the  correspondence  of  Theorem  9.38  is  one-one  onto,  tpH(p~x  =  H  if  and 
only  if  <p( L)  =  L.  Therefore  H  is  a  normal  subgroup  of  Gal(K/k)  if  and  only  if 
<p(L)  =  L  for  all  <p  e  Gal(K/k). 

Now  suppose  that  H  is  a  normal  subgroup  of  Gal(K/k).  We  have  just  seen  that 
<p(L)  =  L  for  all  <p  e  Gal(K/k).  Then  each  <p  defines  by  restriction  a  member 
<p  =  ^o|  of  Gal(L/k),  and  tp  h->-  <p  is  certainly  a  group  homomorphism.  The 
kernel  of  tp  y  is  the  subgroup  of  Gal  (IK /k)  given  by 

{<p  e  Gal(K/k)  |  tp\h  =  l}, 

and  this  is  just  Gal(K/L).  Thus  tp  tp  descends  to  a  one-one  homomorphism 
of  Gal(K/k) /  Gal(K/L)  into  Gal(L/k),  and  we  have 

|  Gal(K/k)[/|  Gal(K/L)|  <  |Gal(L/k)|. 

We  make  use  of  Corollary  9.7  relating  degrees  of  extensions.  Applying  Proposi¬ 
tion  9.35c  to  K/k  and  K/L,  as  well  as  Proposition  9.33  to  L/k,  we  obtain 

[L  :  k]  =  [K  :  k]/[K  :  L] 

=  |  Gal(K/k)|/|  Gal(K/L)| 

<  |  Gal  (L/k)  |  <  [L  :  k], 

with  equality  at  the  first  <  sign  only  if  tp  i->-  Tp  is  onto  Gal(L/k)  and  with  equality 
at  the  second  <  sign  only  if  L  is  the  splitting  field  over  k  of  the  minimal  polynomial 
of  a  certain  element  y  of  L.  Equality  must  hold  in  both  cases  because  the  end 
members  of  the  display  are  equal,  and  we  conclude  that  tp  Ip  is  onto  and  that 
L/k  is  a  normal  extension. 

We  are  left  with  proving  that  if  L/k  is  a  normal  extension,  then  H  is  a  normal 
subgroup  of  Gal(K/k).  Thus  let  L/k  be  normal.  In  view  of  the  conclusion 
of  the  first  paragraph  of  the  proof,  it  is  enough  to  prove  that  <p(L)  =  L  for  all 
(p  £  Gal  (K/k).  By  definition  of  normal  extension,  L  is  the  splitting  field  of  some 
polynomial  F(X)  in  k[X].  We  may  assume  that  F (  X)  is  monic.  Let  us  write 


F(X)  =  (X  —  x\)  ■  ■  ■  (X  —  x„ )  with  all  xj  in  L. 
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Applying  a  given  member  <p  of  Gal(K/k)  to  the  coefficients,  we  obtain 
F(X)  =  (X  -  <p(Xl))  ■■■(X-  <p(xn)), 

and  here  the  <p{XjY s  are  known  only  to  be  in  K.  By  unique  factorization  in  K[X], 
<p(xi)  =  Xjd)  for  some  j  =  j(i).  Therefore  <p(x,)  is  in  L  for  all  i.  Since  L  is  the 
splitting  field  of  F(X)  over  k,  L  =  k(jti, . . . ,  xn).  Thus  <p  maps  L  into  L.  □ 

The  examples  of  Galois  groups  given  in  Section  6  all  involved  fields  that  are 
finite  extensions  of  the  rationals  Q.  As  we  shall  see  in  Section  1 7 ,  it  is  important  for 
the  understanding  of  Galois  groups  of  finite  extensions  of  Q  to  be  able  to  identify 
Galois  groups  of  finite  extensions  of  finite  fields.  This  matter  is  addressed  in  the 
following  proposition. 

Proposition  9.40.  Let  IK  be  a  finite  extension  of  the  finite  field  Fq,  where 
q  =  p“  and  p  is  prime,  and  suppose  that  [K  :  Fq  ]  =  n.  Then  K  is  a  Galois 
extension  of  F9,  the  Galois  group  Gal  (K /Fq )  is  cyclic  of  order  n ,  and  a  generator 
is  the  «th-power  Frobenius  automorphism  x  m>-  x9  =  x p" . 

PROOF.  Theorem  9.14  shows  that  IK  is  a  splitting  field  for  Xq"  —  X  over  Fp. 
Hence  it  is  a  splitting  field  for  Xq  —  X  over  F?,  and  K /Fq  is  a  normal  extension. 
The  polynomial  Xq"  —  X  has  no  multiple  roots,  and  it  follows  that  K/F?  is  a 
separable  extension. 

Define  <p  by  (p{x)  =  xq.  Lemma  9.18  shows  that  (p  is  an  automorphism  of  K. 
Since  every  member  of  Fx  has  order  dividing  q  —  1,  every  nonzero  element  of  Fq 
is  fixed  by  <p.  The  map  <p  certainly  carries  0  to  0,  and  thus  (p  is  in  Gal(K/F?).  By 
a  similar  argument,  cp"  fixes  every  element  of  K,  and  hence  (p"  =  1 .  Corollary 
4.27  shows  that  Kx  is  cyclic,  hence  that  there  exists  an  element  y  in  IKX  such 
that  yl  ^  1  for  1  <  /  <  qn  —  1.  This  y  has  y1  ^  y  for  2  <  /  <  qn  —  1.  Then 
(pK{y)  =  yq  cannot  be  1  for  1  <  k  <  n  —  1,  and  <p  must  have  order  exactly  n. 
This  shows  that  <p  generates  a  cyclic  subgroup  of  order  n  in  Gal(K/F9).  Since 
n  is  an  upper  bound  for  the  order  of  Gal(K/F9)  by  Proposition  9.33,  this  cyclic 
subgroup  exhausts  the  Galois  group.  □ 

Example.  Suppose  that  we  are  given  a  polynomial  with  coefficients  in 
and  we  want  to  find  the  Galois  group  of  a  splitting  field.  Since  there  are  efficient 
computer  programs  for  factoring  the  polynomial  into  irreducible  polynomials, 
let  us  take  that  factorization  as  done.  The  Galois  group  will  be  cyclic  of  some 
order  with  generator  the  Frobenius  automorphism  For  an  irreducible 

polynomial  of  degree  n,  a  splitting  field  has  degree  n,  and  the  smallest  power  of 
x  i — >  x^  that  gives  the  identity  is  the  /zth  power.  The  conclusion  is  that  the  Galois 
group  is  cyclic  of  order  equal  to  the  least  common  multiple  of  the  degrees  of  the 
irreducible  constituents,  a  generator  being  the  Frobenius  automorphism. 
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9.  Application  to  Constructibility  of  Regular  Polygons 

In  this  section  we  use  Galois  theory  to  give  a  proof  of  Theorem  9.25  concerning 
the  constructibility  of  regular  77-gons.  Let  us  recall  the  statement. 

Theorem  9.25  (Gauss).  A  regular  77-gon  is  constructible  with  straightedge 
and  compass  if  and  only  if  n  is  the  product  of  distinct  Fermat  primes  and  a  power 
of  2. 

Proof  of  sufficiency.  First  suppose  that  n  is  a  Fermat  prime  n  =  2 2 '  +  1 . 
Let  K  =  Q(e2n,/n).  We  saw  in  Section  5  that  the  degree  [K  :  Q]  is  2lN ,  hence  is 
a  power  of  2.  Furthermore  we  know  that  K  is  a  separable  extension  of  Q,  being 
of  characteristic  0,  and  it  is  normal,  being  the  splitting  field  for  Xn  —  1  over  Q. 

_  _  r\  ft 

In  Section  6  we  saw  that  the  Galois  group  Gal(K/Q)  is  cyclic  of  order  2  .  Let 
c  be  a  generator  of  this  group.  For  each  integer  k  with  0  <  k  <  2N ,  let  //2*  be 

/,  k 

the  unique  cyclic  subgroup  of  Gal(K/Q)  of  order  2  .  For  this  subgroup,  cl 
is  a  generator.  Put  L2t-  =  KHik .  Then  we  have  inclusions 

{1}  C  H2  C  H2i  C  •  •  •  H2t  C  •  •  •  C  H22n_,  C  H21n  =  Gal(K/Q), 

the  index  being  2  at  each  stage.  Theorem  9.38  says  that  the  correspondence 
with  intermediate  fields  reverses  inclusions  and  that  the  degree  of  each  consec¬ 
utive  extension  of  subfields  matches  the  index  of  the  corresponding  consecutive 
subgroups.  The  intermediate  fields  are  therefore  of  the  form 

IK  ^  L2  2  l22  2  •  •  •  l2*  2  •  •  •  2  L22w_,  2  L22»  =  Q, 

and  the  degree  in  each  case  is  2.  In  view  of  the  formula  for  the  roots  of  a 
quadratic  polynomial,  each  extension  is  obtained  by  adjoining  some  square  root. 
By  Theorem  9.24  the  members  of  K  are  constructible  with  straightedge  and 
compass.  In  particular,  e2jr''/"  is  constructible,  and  a  regular  77-gon  is  constructible. 

Next  suppose  that  and  e27Tl/s  are  both  constructible  and  that  GCD(r,  s)  = 
1 .  Choose  integers  a  and  b  with  ar  +  bs  =  1 ,  so  that  "  +  *  =  K .  Then  the 
equality  (e'ljTlls)a(eln,lr)h  =  el7ll^rs')  shows  that  el7Tt/b  s)  is  constructible.  This 
proves  the  sufficiency  for  any  product  of  distinct  Fermat  primes.  Bisection  of  an 
angle  is  always  possible  with  straightedge  and  compass,  as  was  observed  in  the 
third  paragraph  of  Section  5,  and  the  proof  of  the  sufficiency  in  Theorem  9.25  is 
therefore  complete.  □ 

Remarks.  The  above  proof  shows  that  the  construction  is  possible,  but  it  gives 
little  clue  how  to  carry  out  the  construction.  We  shall  address  this  matter  further 
in  Section  12. 
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We  turn  our  attention  to  the  necessity —that  n  has  to  be  the  product  of  distinct 
Fermat  primes  and  a  power  of  2  if  a  regular  n  -gon  is  constructible.  For  the  moment 
let  n  >  1  be  any  integer.  Let  us  consider  the  distinct  nlh  roots  of  1  in  C,  which 
are  eA’2jr,//"  for  0  <  k  <  n.  The  order  of  each  of  these  elements  divides  n,  and  the 
order  is  exactly  n  if  and  only  if  GCDt/c.  n)  =  1.  In  this  case  we  say  that  ek2ni/n 
is  a  primitive  /zth  root  of  1 .  Define  the  cyclotomic  polynomial  4>„  (X )  by 

d>„(X)  =  [I  (X-ek2”i/n). 

GCD(fc,«)=l, 

0  <k<n 


Each  such  polynomial  is  monic  by  inspection.  The  splitting  field  Q(e2jr</")  in  C 
is  called  a  cyclotomic  field.  Since  the  complex  roots  of  X"  —  1  are  exactly  the 
numbers  ek27T'/'\  we  have 


xn  - 1 = n 

d\n 


the  product  being  taken  over  the  positive  divisors  d  of  n. 

Lemma  9.41.  Each  cyclotomic  polynomial  <J>„  (X)  lies  in  Z[X],  and  the  degree 
of  <F„(X)  is  (p(n),  where  <p  is  the  Euler  <p  function  defined  just  before  Corollary 

1.10. 

PROOF.  We  know  that  d>„  (X)  is  in  C[  X  ],  and  we  begin  by  showing  by  induction 
on  n  that  <J>„ (^d)  is  in  Q[X],  For  n  =  1,  we  have  <bi[X]  =  X  —  1,  and  the 
assertion  is  true.  If  it  is  true  for  all  d  with  1  <  d  <  n,  then  the  formula 
Xn  —  1  =  <Frf(X)  and  induction  show  that  Xn  —  1  =  4>„(X)F(Z)  for  some 
F(X)  in  Q[X],  By  the  division  algorithm,  Xn  -  1  =  F{X)Q{X)  +  R(X)  for 
polynomials  Q(X)  and  R(X)  in  Q[X]  with  R(X)  =  0  or  deg  R(X )  <  deg  F(X). 
Subtraction  gives  F(X)(<F,1(Z)  -  Q{X))  =  -R(X)  in  C[X],  If  R(X)  is  not 
0,  then  deg  R(X)  <  deg  F(X)  gives  a  contradiction.  Therefore  R(X)  =  0  and 
F(X)(<F„(X)-e(X))  =  0.  Since  C[X]  is  an  integral  domain,  <F„(X)  =  Q(X). 
Thus  <J>„  (^d)  is  in  Q[X],  and  the  induction  is  complete. 

To  see  that  4>„  (X)  is  in  Z[X\,  we  again  induct,  the  case  n  =  1  being  clear.  The 
formula  Xn  —  1  =  \\d\n  4>d(X)  and  induction  show  that  Xn  —  1  =  <b„ (X)F(X) 
for  some  F(X)  in  Z[X],  Since  is  known  to  be  in  Q[X],  Corollary  8.20c 

shows  that  d>„  (X)  is  in  Z[X\,  and  the  induction  is  complete.  □ 

Lemma  9.42.  Each  cyclotomic  polynomial  d>„  (X)  is  irreducible  as  a  member 
of  Q[X]. 
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PROOF.  Let  f  be  a  primitive  nth  root  of  1 ,  let  p  be  a  prime  number  not  dividing 
n,  let  F(X)  be  the  minimal  polynomial  of  f  over  Q,  and  let  G(X)  be  the  minimal 
polynomial  of  f p .  The  main  step  is  to  show  that  F(X)  =  G(X). 

To  carry  out  this  step,  we  observe  that  F(f)  =  G(fp)  =  0  and  that  F(X) 
and  G(X)  must  divide  d>„(A).  Arguing  by  contradiction,  suppose  that  F(X) 
G{X).  Then  GCD( F.  G)  =  1  since  F(X)  and  G(X)  are  irreducible  over  Q,  and 
therefore  F(X)G(X)  divides  (X ).  Hence  we  can  write 

X"  -  1  =  F(X)G(X)H(X), 

and  H(X)  is  a  monic  member  of  Z[X]  by  Lemma  9.41  and  Corollary  8.20c. 
Since  f  is  a  root  of  G(XP),  we  must  have  G(XP )  =  F(X)M(X )  for  some 
monic  polynomial  M(X)  in  ZjX\.  We  apply  the  substitution  homomorphism  to 
Z[X]  — »■  Fp[X]  that  carries  X  to  X  and  reduces  the  coefficients  modulo  p;  the 
mapping  on  the  coefficients  will  be  denoted  by  a  bar.  Then  we  have 

Xn  -I  =  F(X)G(X)H(X)  and  G(X)P  =  G(XP)  =  F(X)M(X), 

the  equality  G(X)P  —  G(XP)  following  from  Lemma  9.18.  If  Q(X)  is  a  prime 
factor  of  F(X),  then  Q(X )  divides  G(X)P  and  therefore  must  divide  G(X).  So 
Q(X )2  divides  Xn  —  1.  Therefore  Xn  —  1  has  multiple  roots  in  its  splitting  field, 
in  contradiction  to  Corollary  9.17  and  the  fact  that  the  derivative  of  X"  —  1  is 
nonzero  at  each  nonzero  member  of  Fp  (since  GCD(  /;.  n)  =  1  by  assumption). 
We  conclude  that  F{X)  =  G(X). 

Now  suppose  that  r  is  a  positive  integer  with  GCD(r,  n)  =  1.  Then  we  can 
write  r  =  p\  ■■■  pi  with  each  pj  not  dividing  n,  and  we  see  inductively  that  fr  has 
F(X)  as  minimal  polynomial.  Thus  F(X)  has  at  least  <p(n)  roots.  Since  F(X) 
divides  <L„(A),  we  must  have  F(X)  =  <L„(Z).  Therefore  T,, (If )  is  irreducible 
over  Q.  □ 

Proof  of  necessity  in  Theorem  9.25.  Theorem  9.24  shows  that  the  degree 
[Q(c2;r'/'!)  :  Q]  must  be  a  power  of  2  if  a  regular  77-gon  is  constructible.  Since 
e2jn/n  js  a  root  0f  fpfX)  and  since  Lemma  9.42  shows  <t>„(X)  to  be  irreducible 
over  Q,  <L„(A)  is  the  minimal  polynomial  of  e2jr'f"  over  Q.  By  Lemma  9.41  the 
degree  in  question  is  given  by  [Q(e2jr'/'!)  :  Q]  =  <p(n),  where  (p  is  the  Euler  tp 
function.  Corollary  1.10  shows  that  if  n  =  p\  ■  ■  ■  pkf  is  a  prime  factorization  of 
n  into  distinct  prime  powers  with  each  kj  >  0,  then 

(p{n)  =  f]  p-~\pj  -  1). 
j=  1 

For  constructibility  this  must  be  a  power  of  2.  Then  each  pj  dividing  n  must  be  1 
more  than  a  power  of  2,  i.e.,  must  be  2  or  a  Fermat  prime,  and  the  only  pj  allowed 
to  have  pj  dividing  n  is  pj  =2.  □ 
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10.  Application  to  Proving  the  Fundamental  Theorem  of  Algebra 

In  this  section  we  use  Galois  theory  to  give  a  proof  of  the  Fundamental  Theorem 
of  Algebra.  Let  us  recall  the  statement. 

Theorem  1.18  (Fundamental  Theorem  of  Algebra).  Any  polynomial  in  C[X] 
with  degree  >  1  has  at  least  one  root. 

We  begin  with  a  lemma  that  handles  three  easy  special  cases. 

Lemma  9.43.  There  are  no  finite  extensions  of  R  of  odd  degree  greater  than  1 , 
the  only  extension  of  R  of  degree  2  up  to  R  isomorphism  is  C,  and  there  are  no 
finite  extensions  of  C  of  degree  2. 

PROOF.  If  K  is  a  finite  extension  of  R  of  odd  degree  and  if  x  is  in  K,  then 
[R(x)  :  R]  is  odd,  and  consequently  the  minimal  polynomial  F(X)  of  x  over 
R  is  irreducible  of  odd  degree.  By  Proposition  1.20,  which  is  derived  from  the 
Intermediate  Value  Theorem  of  Section  A3  of  the  appendix,  F(X)  has  at  least 
one  root  in  R.  Therefore  F(X)  has  degree  1,  and  x  is  in  R. 

If  F(X)  is  an  irreducible  polynomial  in  R|  X  ]  of  degree  2,  then  F(X)  splits  in 
C  by  the  quadratic  formula,  and  hence  the  only  extension  of  R  of  degree  2  is  C, 
up  to  R  isomorphism,  by  the  uniqueness  of  splitting  fields  (Theorem  9.13). 

Let  G(X)  =  X2  +  bX  +  c  be  a  polynomial  in  C[X]  of  degree  2.  Then  G(X) 
has  a  root  x  in  C  given  by  the  quadratic  formula  since  every  member  of  C  has 
a  square  root8  in  C,  and  G(X)  cannot  be  irreducible.  Since  any  finite  extension 
of  C  of  degree  2  would  have  to  be  of  the  form  C(jc),  with  x  equal  to  a  root  of  an 
irreducible  quadratic  polynomial  over  C,  there  can  be  no  such  extension.  □ 

Proof  of  Theorem  1.18.  First  let  us  show  that  every  irreducible  member 
F(X)  of  R[X]  splits  over  C.  Let  K  be  a  splitting  field  for  F(X).  Say  that 
[K  :  R]  =  2 mN  with  N  odd.  Then  K  is  a  Galois  extension  of  R,  and  |  Gal(K/R)| 
=  2"'N.  By  the  Sylow  Theorems  (particularly  Theorem  4.59a),  let  H  be  a  Sylow 
2-subgroup  of  Gal(K/R).  This  FI  has  \H\  =  2'".  The  field  L  =  ¥LH  that 
corresponds  to  H  under  Theorem  9.38  has  [L  :  R]  =  N  with  N  odd,  and  the 
first  conclusion  of  Lemma  9.43  shows  that  N  =  1.  Thus  |  Gal(K/R)|  =  2'". 
Corollary  4.40  shows  that  Gal(K/R)  has  nested  subgroups  of  all  orders  2m~k 
with  0  <  k  <  m,  and  Theorem  9.38  says  that  the  corresponding  fixed  fields  are 
nested  and  have  respective  degrees  2k  with  0  <  k  <  m.  The  extension  field  of 
R  for  k  =  1  is  necessarily  C  by  Lemma  9.43,  and  Lemma  9.43  shows  that  there 

8To  see  that  every  member  of  C  has  a  square  root  in  €,  let  c  +  di  be  given  with  c  and  d  real  and 
with  d  0.  Let  a  and  h  be  real  numbers  with  cr  =  | (c  +  s/c2  +  d2  ),  b2  =  *  (— c  +  s/c2  +  d2  ), 
and  sgn (ab)  =  sgn d.  Then  (a  +  bi)2  =  c  +  di. 
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are  no  quadratic  extensions  of  C.  Therefore  m  =  0  or  m  =  1,  and  the  possible 
splitting  fields  for  F(X)  are  R  and  C  in  the  two  cases. 

To  complete  the  proof,  suppose  that  K  is  a  finite  algebraic  extension  of  C  of 
degree  n.  Then  K  is  a  finite  algebraic  extension  of  R  of  degree  2 n.  The  Theorem 
of  the  Primitive  Element  allows  us  to  write  IK  =  R(x)  for  some  x  e  K,  and 
the  minimal  polynomial  of  x  over  R  necessarily  has  degree  2 n.  The  previous 
paragraph  shows  that  this  polynomial  splits  in  C.  Thus  x  is  in  C,  and  IK  =  C. 
This  completes  the  proof.  □ 


11.  Application  to  Unsolvability  of  Polynomial 
Equations  with  Nonsolvable  Galois  Group 

The  quadratic  formula  for  finding  the  roots  of  a  quadratic  polynomial  has  in 
principle  been  known  since  the  time  of  the  Babylonians  about  400  B.C.9  The 
corresponding  problem  of  finding  roots  of  cubics  was  unsolved  until  the  sixteenth 
century,  and  Cardan’s  formula  was  discovered  at  that  time.  The  original  formula 
assumes  real  coefficients  and  was  in  two  parts,  a  first  case  corresponding  to 
what  we  now  view  as  one  real  root  and  two  complex  roots,  the  second  case 
corresponding  to  what  we  view  as  three  real  roots.10  There  is  a  similar  formula, 
but  more  complicated,  for  solving  quartics.  Further  centuries  passed  with  no 
progress  on  finding  a  corresponding  formula  for  the  roots  of  a  polynomial  of 
degree  5  or  higher.  The  introduction  of  Galois  theory  in  the  early  nineteenth 
century  made  it  possible  to  prove  a  surprising  negative  statement  about  all  degrees 
beyond  4. 

Suppose  that  we  are  given  a  polynomial  equation  with  coefficients  in  the  field 
Q  or  a  more  general  field  k  of  characteristic  0.  In  this  section  we  use  Galois 
theory  to  address  the  question  whether  the  roots  of  the  equation  in  a  splitting  field 
can  be  expressed  in  terms  of  k  and  the  adjunction  of  finitely  many  nth  roots  to  the 
field,  for  various  values  of  n.  For  the  moment  let  us  say  in  this  case  that  the  roots 
are  “expressible  in  terms  of  the  members  of  k  and  radicals.”  We  shall  make  this 
notion  more  precise  shortly. 

Recall  from  Section  IV.8  that  with  a  finite  group  G,  we  can  find  a  strictly 
decreasing  sequence  of  subgroups  starting  with  G  and  ending  with  {1}  such 

9The  Babylonians  did  not  actually  have  equations  but  had  an  algorithmic  method  that  amounted 
to  completing  the  square. 

10Cardan"s  name  was  Girolamo  Cardano.  The  solution  in  the  first  case  of  the  cubic  seems  to 
have  been  discovered  by  Scipione  dal  Ferro  and  later  by  Nicolo  Tartaglia.  Dal  Ferro  died  in  1526 
and  passed  the  secret  method  to  his  student  Antonio  Fior.  In  1535  Fior  engaged  in  a  public  contest 
with  Tartaglia  at  solving  cubics,  and  he  lost.  Cardano  wheedled  the  solution  method  in  the  first  case 
from  Tartaglia,  published  it  in  1539,  and  discovered  and  published  the  solution  in  the  second  case. 
Cardano’s  student  Lodovico  Ferrari  discovered  how  to  solve  quartics,  and  Cardano  published  that 
solution  as  well.  See  "St.  Andrews”  in  the  Selected  References  for  more  information. 
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that  each  subgroup  is  normal  in  the  next  larger  one  and  each  quotient  group  is 
simple.  Such  a  series  was  defined  to  be  a  composition  series  for  G.  The  Jordan- 
Holder  Theorem  (Corollary  4.50)  says  that  the  respective  consecutive  quotients 
are  isomorphic  for  any  two  composition  series,  apart  from  the  order  in  which  they 
appear.  We  define  the  finite  group  G  to  be  solvable  if  each  of  the  consecutive 
quotients  is  cyclic  of  prime  order,  rather  than  nonabelian.  It  is  enough  that  the 
group  have  a  normal  series  for  which  each  of  the  consecutive  quotients  is  abelian. 

Examples  of  solvable  and  nonsolvable  groups  are  obtainable  from  the  calcula¬ 
tions  in  Section  IV.  8:  abelian  groups  and  groups  of  prime-power  order  are  always 
solvable,  the  symmetric  group  ©4  and  each  of  its  subgroups  are  solvable,  and  the 
symmetric  group  65  is  not  solvable  since  a  composition  series  is  ©5  D  2I5  3  {1} 
and  the  group  2I5  is  simple  (Theorem  4.47). 

Modulo  a  precise  definition  for  a  held  k  of  the  words  “expressible  in  terms  of 
the  members  of  k  and  radicals,”  the  answer  to  our  main  question  is  as  follows. 

Theorem  9.44  (Abel,  Galois).11  Let  k  be  a  held  of  characteristic  0,  let  F(X) 
be  in  k[Z],  and  let  K  be  a  splitting  held  of  F(X)  over  k.  Then  the  roots  of  F(X) 
are  expressible  in  terms  of  the  members  of  k  and  radicals  if  and  only  if  the  group 
Gal(K/k)  is  solvable. 

Example.  With  k  =  Q,  let  F(X)  be  the  polynomial  F(X )  =  X5  —  5X  +  1  in 
Q[X].  We  shall  show  that 

(i)  F(X)  is  irreducible  over  Q, 

(ii)  F(X)  has  three  roots  in  M  and  one  pair  of  conjugate  complex  roots  in  C, 

(iii)  the  splitting  held  K  over  Q  of  any  polynomial  of  degree  5  for  which  (i) 
and  (ii)  hold  has  Galois  group  with  Gal(K/Q)  =  ©5. 

We  know  that  from  Theorem  4.47  that  ©5  is  not  solvable,  and  Theorem  9.44 
therefore  allows  us  to  conclude  that  the  roots  of  X 5  —  5X  +  1  are  not  expressible 
in  terms  of  the  members  of  Q  and  radicals. 

To  prove  (i),  we  apply  Eisenstein’s  criterion  (Corollary  8.22)  to  the  polynomial 
F(X  -  l)  =  X5  -  5X4  +  10X3  -  10X2  +  5  and  to  the  prime  p  =  5,  and  the 
irreducibility  is  immediate. 

To  prove  (ii),  we  observe  that  F(— 2)  <  0,  F(0)  >  0,  F(  1)  <  0,  F(2)  >  0. 
Applying  the  Intermediate  Value  Theorem  (Section  A3  of  the  appendix),  we  see 
that  there  are  at  least  three  roots  in  M.  Since  F'(X)  =  5(X4  —  1)  has  exactly  the 
two  roots  ±1  in  M,  F(X  )  has  at  most  three  roots  in  M  by  an  application  of  the 
Mean  Value  Theorem. 

To  prove  (iii),  label  the  roots  1,  2,  3,  4,  5  with  1  and  2  denoting  the  nonreal 
roots.  Each  member  of  the  Galois  group  permutes  the  roots  and  is  determined 

1 1  Abel  proved  that  there  is  no  general  solution  via  radicals  that  gives  the  roots  of  polynomials 
of  degree  5.  Galois  found  the  present  theorem,  which  shows  how  to  decide  the  question  for  each 
individual  polynomial  of  degree  5. 
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by  its  effect  on  the  roots.  Thus  Gal(K/Q)  may  be  regarded  as  a  subgroup  of  65. 
Since  F(X)  is  irreducible  over  Q,  5  divides  [K  :  Q]  and  5  divides  |  Gal(K/Q)|. 
By  the  Sylow  Theorems,  Gal(K/Q)  contains  an  element  of  order  5,  hence  a  5- 
cycle.  Some  power  of  this  5-cycle  carries  root  1  to  root  2.  So  we  may  assume 
that  the  5-cycle  is  (1  2  3  4  5).  Also,  Gal(K/Q)  contains  complex  conjugation, 
which  acts  as  (1  2).  Then  Gal(K/Q)  contains 

(1  2  3  4  5) ( 1  2) (1  2  3  4  5)_1  =  (2  3), 

(1  2  3  4  5) (2  3) (1  2  3  4  5)_1  =  (3  4), 

(1  2  3  4  5) (3  4) (1  2  3  4  5)_1  =  (4  5). 

Since  the  set  {( 1  2),  (2  3),  (3  4),  (4  5)}  of  transpositions  is  easily  shown  from 

Corollary  1.22  to  generate  ©5,  Gal(K/Q)  =  ©5. 

Let  K'  be  a  finite  extension  of  the  given  held  k.  A  root  tower  for  K'  over  k  is 
a  finite  sequence  of  extensions 

k  =  Kj)  c  K'  c  •  •  •  c  k;_!  c  k;  =  K' 

such  that  for  each  i  with  0  <  i  <1  —  1,  there  is  a  prime  number  «,  >  1  and  there 
is  an  element  r,  in  K-+1  with  a,-  =  r"‘  in  K-  and  r,-  not  in  E' .  Then  it  follows  that 
r-  is  not  in  K-  for  any  k  with  0  <  k  <  ti,. 

(If  we  write  =  r"1 ,  then  we  might  think  of  writing  K-+1  =  K-(  "i/ctj ),  but 
this  formulation  is  less  precise  at  the  moment  since  it  does  not  specify  precisely 
which  choice  of  (/a,  is  to  be  used.) 

With  “root  tower”  now  well  defined,  we  can  make  a  precise  definition  and 
thereby  complete  the  precise  formulation  of  Theorem  9.44.  Let  k  be  the  given 
held  of  characteristic  0,  let  F(X)  be  ink[X],  and  let  K  be  a  splitting  held  of  F(  X) 
over  k.  We  say  that  the  roots  of  F(X)  are  expressible  in  terms  of  members  of 
k  and  radicals  if  there  exists  some  hnite  extension  K'  of  IK  having  a  root  tower 
over  k. 

The  statement  of  Theorem  9.44  is  now  completely  precise,  and  the  remainder 
of  the  section  will  be  devoted  to  the  proof  of  one  direction  of  the  theorem:  if  the 
roots  are  expressible  in  terms  of  members  of  k  and  radicals,  then  the  Galois  group 
is  solvable.  The  proof  of  the  converse  direction  of  the  theorem  is  postponed  to 
Section  13.  We  begin  with  a  lemma. 

Lemma  9.45.  Let  k  be  a  held  of  any  characteristic,  and  let  p  be  a  prime 
number.  If  a  is  a  member  of  k  such  that  Xp  —  a  has  no  root  in  k,  then  Xp  —  a  is 
irreducible  in  k. 
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PROOF.  First  suppose  that  p  is  different  from  the  characteristic.  Let  L  be  a 
splitting  field  for  Xp  —  a.  The  derivative  of  Xp  —  a,  evaluated  at  any  root  of 
Xp  —  a  in  L,  is  nonzero,  and  Corollary  9.17  shows  that  Xp  —  a  splits  as  the 
product  of  p  distinct  linear  factors  in  L.  The  quotient  of  any  two  roots  of  Xp  —  a 
is  a  pth  root  of  1 .  Fixing  one  of  these  two  roots  of  Xp  —  a  and  letting  the  other 
vary,  we  obtain  p  distinct  p{h  roots  of  1 .  Thus  L  contains  all  p  of  the  pth  roots 
of  1.  Proposition  4.26  shows  that  the  group  of  pth  roots  of  1  is  cyclic.  Let  f  be  a 
generator.  If  ax^p  denotes  one  of  the  roots  of  Xp  —  a  in  L,  then  the  set  of  all  the 
roots  is  given  by  {a^p'C,k  |  0  <  k  <  p  —  1}. 

Now  suppose  that  Xp  —  a  has  a  nontrivial  factorization  Xp  —  a  =  F  (X)G(X) 
in  k[  A"  ].  Possibly  by  adjusting  the  leading  coefficients  of  F(X)  and  G(X),  we 
may  assume  that  F{X)  and  G(X)  are  both  monic.  Unique  factorization  in  L[  A  | 
then  implies  that  there  is  a  nonempty  subset  S  of  {k  |  0  <  k  <  p  —  1}  with  a 
nonempty  complement  Sc  such  that 

F(X )  =  ft  (*  -  Skal/P)  and  G(X)  =  ["[  (x  ~  Skal/p). 

keS  keSc 

If  S  has  m  elements,  then  the  constant  term  of  F(X)  is  (— a1  ^p )mco,  where  co 
is  some  pth  root  of  1.  Thus  x  =  (al^p)mco  is  in  k.  Since  GCD(?«,  p)  =  1, 
we  can  choose  integers  c  and  d  with  cm  +  dp  =  1.  Since  x  is  in  k,  so  is 
xcad  =  (a1!  p)mc+dp  of  =  axlpuf .  Butu'/^ft/ isarootofXp— a,  in  contradiction 
to  the  hypothesis  that  no  root  of  Xp  —  a  lies  in  k.  Hence  Xp  —  a  is  irreducible. 

If  p  equals  the  characteristic  of  k,  then  Lemma  9.18  gives  the  factorization 
Xp  —  a  =  (X  —  a1/,p)p,  wherea1^  is  one  root  of  Xp  —a  in  K.  Then  we  can  argue 
as  above  except  that  f  and  co  are  to  be  replaced  by  1  throughout.  This  completes 
the  proof  of  the  lemma.  □ 

Proof  of  necessity  in  Theorem  9.44  that  Gal(K/k)  be  solvable.  We 
are  to  prove  that  if  some  finite  extension  K7  of  K  has  a  root  tower  over  k,  then 
Gal(K/k)  is  solvable. 

Step  1 .  We  enlarge  each  field  in  the  given  root  tower  to  obtain  a  root  tower 

k  c  Kg  c  K'/  c  •  •  •  c  K"_j  c  K"  =  K" 

of  a  finite  extension  If"  of  K'  in  such  a  way  that  If(j'  is  the  normal  extension  of  k 
obtained  by  adjoining  all  77th  roots  of  1  for  a  suitably  large  n  and  such  that  each 
K"+1  is  the  normal  extension  of  K"  for  0  <  7  <1  —  1  obtained  by  adjoining  all  77* 
roots  of  the  member  u,  of  If  ' .  Using  Theorem  9.22,  choose  an  algebraic  closure 
K'  of  K'.  Let  77  be  the  product  of  the  integers  no,  n\,  . . . ,  77/_  1 .  Let  G  -  •  ■  • ,  1 

be  the  ?7th  roots  of  1  in  If  '  other  than  1  itself,  define  subfields  of  If'  by 

K"  =  K;.(G,...X«-i)  for 0  <  7  <  I, 
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and  put  K"  =  KJ.  The  field  K('|  is  a  splitting  field  for  Xn  —  1  over  k  and  is  therefore 
a  normal  extension.  The  field  K"+1  is  given  by  K"+1  =  where  r,-  is  a  root 

in  K"+ !  of  the  polynomial  X'li  —  a,  in  K"[X].  Here  /;,  is  prime.  Lemma  9.45 
shows  that  either  r,  is  in  K"[X]  or  X"‘  —  a,  is  irreducible  in  K" [  X  ] .  In  the  first 
case,  K"+i  =  K",  and  we  have  a  normal  extension.  In  the  second  case,  K"+1  is 
a  splitting  field  for  Xn‘  —  a  -,  over  K"  because  it  is  generated  by  K"  and  one  root 
of  Xn>  —  at  and  because  all  nf  roots  of  1  already  lie  in  Ejj ;  thus  again  we  have  a 
normal  extension. 

Step  2.  The  Galois  group  of  Kq  over  k  is  abelian.  In  fact.  Proposition  4.26 
shows  that  the  group  of  nth  roots  of  1  in  K('J  is  cyclic.  Let  f  be  a  generator,  and 
let  U  =  {4T*}*=o-  The  map  of  GalfK^/k)  into  AutG  given  by  (p  i->  (p\(J  is  a 
one-one  homomorphism,  and  Aut  U  is  isomorphic  to  (Z/nZ)x.  Since  (Z/«Z)X 
is  abelian,  it  follows  that  GalfK^/k)  is  abelian. 

Step  3.  The  Galois  group  of  K"+1  over  K"  is  trivial  or  is  cyclic  of  order 
In  fact,  the  Galois  group  is  trivial  if  K"+1  =  K".  The  contrary  case  is  that 
[K"+i  :  K"]  =  Hi,  and  then  Gal(K"+1/K")  has  order  n,,  which  is  prime.  Every 
group  of  order  iij  is  cyclic,  and  hence  Gal(K"+|  /K")  is  cyclic. 

Step  4.  We  extend  the  root  tower  to  a  larger  field  L  3  K"  that  is  a  normal 
extension  of  k.  The  resulting  root  tower  of  L  will  be  written  as 

k  c  L0  =  K"  c  Lj  =  Kj  c  •  •  • 
c  L*_!  =  K"_,  c  L,  =  K"  c  L,+1  c  •  •  •  c  Lf  =  L. 

As  it  is,  we  cannot  say  that  K"  is  the  splitting  field  over  k  for  the  product  of  the 
minimal  polynomials  used  in  Step  1 ,  because  the  elements  a,  are  not  assumed  to 
lie  in  k.  To  adjust  the  tower  to  correct  this  problem,  write  K"  as 

K"  =  k(ro,n, . . .  ,n_i,  £)  =  k(x0, . .  .,*/), 

with  f  as  in  Step  2.  Here  ro, . . . ,  r/_  i  are  the  given  elements  that  define  the 
original  root  tower,  and  we  define  Xi  =  f  and  Xj  =  if  for  0  <  j  <  l.  Since  K"  is 
a  finite  extension  of  k,  each  xj  has  a  minimal  polynomial  G/(X)  over  k.  Define 
G(X)  =  n'=0  G/(X),  and  let  L  be  the  splitting  field  of  G(X)  in  the  algebraic 
closure  IK'.  The  field  L  is  a  normal  extension  of  k.  The  roots  of  G(X)  are  the 
members  of  L  that  are  roots  of  some  G ,  (X).  Each  Xj  is  a  root  of  its  own  G,  (X). 
If  Xj  is  another  root  of  G/(X),  then  there  is  a  k  isomorphism  of  k  (Xj )  onto  k(x'  ), 
and  we  know  by  the  uniqueness  of  splitting  fields  (Theorem  9.13')12  that  this 

12The  theorem  is  to  be  applied  to  a  :  k (xj)  -»  k(x')  with  F(X)  =  Fa (X)  =  G(X)  and  with 
L'  =  L. 
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extends  to  a  k  isomorphism  of  L  onto  L.  Hence  to  each  root  9  of  G(X)  in  L 
corresponds  some  Xj  and  some  <p  e  Gal(K/k)  with  <p(xj)  =  9.  Thus 

L  =  k({^o(jc/)  |  0  <  j  <  l  and  <p  e  Gal(L/k}). 

For  any  <p  in  Gal(L/k)  and  any  j  <  l  —  1,  the  element  <p (xj )  of  L  satisfies 
(i P(xj))nj  =  <p(x"J)  =  <p(aj), 

and  the  element  on  the  right  is  in  <p(K").  Any  element  <p(£)  is  an  n th  root  of  1 
and  hence  is  already  in  Kq  ;  such  elements  are  redundant  for  (p  ^  1 .  Enumerate 
Gal(L/k)  as  <p\, . . . ,  <ps  with  (pi  =  1.  The  tower  for  K"  is  to  be  continued  with 
the  fields  obtained  by  adjoining  one  at  a  time  the  elements 

(pliro), <P2(n- t),  <p3(ro),  •  ■  ■ ,  <P3(ri-i),  <ps(r0), (ps(ri- 1). 

The  final  field  is  L,  and  then  we  have  an  enlarged  tower  as  asserted. 

Step  5.  Gal(L/k)  is  a  solvable  group.  In  fact,  first  we  prove  by  induction 
downward  on  i  that  Gal(L/L,  )  is  solvable,  the  case  i  =  t  being  the  case  of 
the  trivial  group.  Let  i  <  t  be  given.  We  have  arranged  that  L,  +  i  is  a  normal 
extension  of  L,-.  Since  L  is  normal  over  all  the  smaller  fields  by  Step  4,  Corollary 
9.39  therefore  gives  Gal(Li+i/L(-)  =  Gal(L/L,-)/ Gal(L/L,+1).  The  group  on 
the  left  side  is  cyclic  by  Step  3  or  the  analogous  proof  with  some  rj  replaced  by 
a  suitable  <p(rj),  and  thus  a  normal  series  with  abelian  quotients  for  Gal(L/L,+i) 
may  be  extended  by  including  the  term  Gal(L/L,),  and  the  result  is  still  a  normal 
series  with  abelian  quotients.  Thus  Gal(L/L,)  is  solvable.  This  completes  the 
induction  and  shows  that  Gal(L/Lo)  is  solvable.  To  complete  the  proof  we  use  the 
isomorphism  Gal(Lo/k)  =  Gal(L/k) /  Gal(L/Lo)  given  by  Corollary  9.39.  The 
group  on  the  left  side  is  abelian  by  Step  2,  and  thus  a  normal  series  with  abelian 
quotients  for  Gal(L/Lo)  may  be  extended  by  including  the  term  Gal(L/k),  and  the 
result  is  still  a  normal  series  with  abelian  quotients.  Thus  Gal(L/k)  is  solvable. 

Step6.  Gal(K/k)  is  a  solvable  group.  WehaveL  9  E  D  k  with  L/k  normal  by 
Step  4  and  with  K/k  normal  since  E  is  a  splitting  field  of  F(X )  over  k.  Applying 
Corollary  9.39,  we  obtain  an  isomorphism  Gal(K/k)  =  Gal(L/k)  j  Gal(L/K). 
Then  Step  6  will  follow  from  Step  5  if  it  is  shown  that  any  homomorphic  im¬ 
age  of  a  solvable  group  is  solvable.  Thus  let  G  be  a  solvable  group,  and  let 
<p  :  G  — >■  H  be  an  onto  homomorphism.  Write  G  =  G  \  3  •  •  •  D  Gm  =  {1} 
with  abelian  quotients,  and  define  //,  =  <p(G/ ).  Passage  to  the  quotient  gives 
us  a  homomorphism  <p,  carrying  G,  onto  H-, / H/+\.  Since  <p(Gl  +  \ )  C  Hl+\, 
<p  induces  a  homomorphism  E,  of  G,-/Gj+ j  onto  //,///, +[.  As  the  image  of 
an  abelian  group  under  a  homomorphism,  Hi/Hi+\  is  abelian.  Therefore  H  is 
solvable.  This  completes  the  proof.  □ 
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12.  Construction  of  Regular  Polygons 

Theorem  9.25  proved  the  constructibility  of  regular  n-gons  when  n  is  the  product 
of  a  power  of  2  and  distinct  Fermat  primes,  but  it  gave  little  clue  how  to  carry 
out  the  construction.  In  this  section  we  supply  enough  further  detail  so  that  one 
can  actually  carry  out  the  construction.  It  is  enough  to  handle  the  case  that  n  is  a 
Fermat  prime,  n  =  2  +  1 ,  and  we  shall  suppose  that  n  is  a  prime  of  this  form. 

Let  f  =  e2ni!n .  The  field  of  interest  is  Q(f ),  with  [Q(f )  :  Q]  =  n  —  1.  The 
usual  basis  of  Q(£)  over  Q  is  {1,  f,  f 2, . . . ,  £"-2},  but  we  shall  use  the  basis 

instead,  in  order  to  identify  the  Galois  group  Gal(Q(f  )/Q)  more  readily  with  F* , 
where  F„  =  Z/nZ  is  the  field  of  n  elements.  In  more  detail  we  associate  the  addi¬ 
tive  group  of  F„  with  the  additive  group  of  exponents  of  the  members  of  the  cyclic 
group{l,  f,  f2.  ^ ,  f"-1},  and  members  of  the  Galois  group  correspond  to  the 
various  multiplications  of  these  exponents  by  F*  =  {1,  2, . . . ,  n  —  1}.  The  group 
F*  is  known  to  be  cyclic  of  order  n  —  1,  and  thus  the  isomorphic  Galois  group 
is  cyclic.  If  a  generator  a  of  the  Galois  group  is  to  correspond  to  multiplication 
by  a  generator  g  of  F* ,  then  a  (£s)  =  fgs  for  all  5.  With  the  prime  n  of  the  form 
2  +1,  let  us  note  for  the  sake  of  completeness  why  we  can  always  take  g  =  3. 

Lemma  9.46.  The  number  3  is  a  generator  of  F*  when  n  is  prime  of  the  form 
2?n  +  1  with  N  >  0. 

Remarks.  We  verified  this  assertion  for  n  =  17  in  Section  6,  and  in  principle 
one  could  verify  the  lemma  in  any  particular  case  in  the  same  way.  Here  is  a 
general  argument  using  the  law  of  quadratic  reciprocity,  whose  full  statement  and 
proof  will  be  given  in  Chapter  I  of  Advanced  Algebra.  For  a  prime  number  n 
that  is  congruent  to  1  modulo  4,  quadratic  reciprocity  implies  that  3  is  a  square 
modulo  n  if  and  only  if  n  is  a  square  modulo  3.  Since 

O IV  oIV-1  oIV-2  ol  ol 

22  —  1  =  (22  +  1)(22  +  1)  •  •  •  (22  +  1)(22  —  1) 

r\  1  r\  JsJ 

and  2  —  1  =  3,  3  divides  2  —  1.  Thus  n  is  congruent  to  2  modulo  3,  n  is 

not  a  square  modulo  3,  and  3  is  not  a  square  modulo  n.  The  nonsquares  modulo 
n  =  2  +1  are  exactly  the  generators  of  F^ ,  and  therefore  3  is  a  generator. 

Taking  Lemma  9.46  into  account,  we  suppose  for  the  remainder  of  this  section 
that  the  generator  a  of  the  Galois  group  corresponds  to  multiplication  of  exponents 
of  f  by  3.  Then  a  (f)  =  f 3  and  a  (£  ')  =  f3s.  These  formulas  and  Q  linearity  tell 
us  explicitly  how  a  operates  on  all  of  Q(f ). 
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The  fixed  fields  that  arise  within  Q(f )  correspond  to  subgroups  of  the  group 
Gal(Q(£)/Q)  =  {cG  |  0  <  j  <  22' },  and  there  is  one  for  each  power  of  2  from 
2  to  2“  .  Fix  attention  on  the  subgroup  Hi  of  order  /,  and  write  2“  =  A/,  with 
k  and  1  being  powers  of  2.  A  generator  of  this  subgroup  is  crk,  and  the  subgroup 
is  Hj  =  { I ,  ak ,  o2k , . . . ,  ct(/-1)A}.  Let  K /  be  the  hxed  held  of  this  subgroup,  or 
equivalently  of  its  generator  ak ;  this  has  dimension  k  over  Q. 

We  shall  determine  a  basis  of  K/  over  Q.  Since  cr(Cs)  =  k3s>  we  have  crA’(fs)  = 
f 3  s .  For  0  <  r  <  k  —  1 ,  the  A  elements 


or  or+A  ir+lk 

^  =  k 3  +  r  +  r  +  •  • 


•  +  ? 


y+k(l—  1) 


are  linearly  independent  over  Q  because  they  involve  disjoint  sets  of  basis  vectors 
of  Q(f )  as  r  varies.  The  computation 


ok{r)r)  =ak(Sy  +ky+  +k 


y+2k  _  3>-+*(/-l) 


) 


■jr+k  'ir+lk  o r+3k 

=r  +r  +r  + 

or  'ir+k  nr+2k 

=  r  +r  +r  +••■ 


+  r 

y+k(l—l) 


=  nr 


shows  that  each  of  these  vectors  is  in  K /.  Flence  {p o,  . . . ,  pr-i}  is  a  basis  of 
K/  over  Q.  The  elements  of  this  basis  are  called  the  periods  of  /  terms  of  the 
cyclotomic  held. 

The  extreme  cases  for  the  periods  are  (A',  l)  =  (2~  ,  1),  for  which  0  <  r  < 
2?-n  —  1  with  r\r  =  f 3",  and  (A,  /)  =  (1,  22JV),  for  which  r  =  0  with 


po  =  C3°  +  f3' 


1.2 

+  r  + 


+  r 


k  +  k2  +  k3  +  •••  +  <"“'  =  — i. 


Two  facts  enter  into  determining  how  to  write  t;  in  terms  of  rationals  and  square 
roots.  The  hrst  is  that  at  stage  A  for  A  >  2,  the  sum  of  certain  pairs  of  >]r  ’s  is 
an  i]  for  stage  A  —  1 .  The  second  is  that  the  product  of  two  >]r ’s  at  stage  A  is  an 
integer  combination  of  if  s  from  the  same  stage  and  that  the  sum  formulas  express 
this  combination  in  terms  of  p’s  from  earlier  stages.  The  result  is  that  at  the  Ath 
stage  we  obtain  expressions  for  the  sum  and  product  of  two  rjr’s  in  terms  of  if  s 
from  earlier  stages.  Therefore  the  two  p,-  ’s  at  stage  A  are  the  roots  of  a  quadratic 
equation  whose  coefficients  involve  if  s  from  earlier  stages.  Consequently  we 
can  compute  the  i)r ’s  explicitly  by  induction  on  A.  To  proceed  further,  we  need 
to  know  the  formula  for  the  product  of  two  p,  ’s,  which  is  due  to  Gauss. 

To  multiply  two  p,  ’s,  we  need  to  multiply  various  powers  of  f ,  and  the  expo¬ 
nents  get  added  in  the  process.  This  addition  is  not  readily  compatible  with  terms 
like  k y  and  £3',  and  for  that  reason  Gauss  introduced  new  notation.  Define 

pW  =  ^  +  ^  +  ^  +  ...  +  ^-»=  E 

v  mod  / 
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for  0  <  t  <  n  —  1.  Then  ri(0)  =  1,  and  for  0  <  t  <  n  —  1,  rj{t)  is  the  pr  in  which 
f '  occurs.  Gauss’s  product  formula  is  given  by 


-3 

H 

M 

(  E  t;s3t“+t3tv\ 

u  mod  / 

v  mod  l 

ii 

M 

(  E  t;s3t,‘+r3ku‘+w) 

u  mod  / 

w  mod  l 

ii 

M 

(  E  K 

w  mod  / 

u  mod  l 

=  E 

w  mod  / 

In  words,  this  says  that  to  multiply  two  p’s,  we  add  the  p’s  for  the  exponents 
obtained  by  multiplying  the  first  term  of  pfvl  by  all  the  terms  of  t](r> . 

At  this  point  it  is  more  illuminating  to  work  some  examples  than  to  try  for  a 
general  result. 


Example  1.  n  =  5,  N  =  1,  22'  =  4.  The  relevant  pairs  (k,  l)  to  study  in 
sequence  are  ( k ,  Z)  =  (1,  4),  (2,  2),  (4,  1),  and  the  case  ( k ,  l)  =  (1, 4)  is  trivial 
since  the  only  subscripted  p  is  Yll=o  ?3'  =  —  1. 


Figure  9.3.  Construction  of  a  regular  pentagon.  The  circle  with  center  Q, 
and  radius  ^  meets  the  line  from  (\,  to  the  origin  at  a  point  at  distance 
cos(27t/5)  from  the  origin. 


For  k  =  2,  i.e.,  for  the  case  that  there  are  2  periods  of  2  terms  each,  we  go 
back  to  the  definition  of  the  p’s  and  find  that 


ho 

^30+2  0  ^  ^30+21 

=  E 

+  E 

0 1+2-0 

01+21 

,  9 

m 

=  r 

+  E 

=  E 

+  E 
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We  form  those  sums  of  pairs  of  p’s  that  yield  an  p  from  the  previous  step.  Here 
there  is  only  one  pair,  and  the  sum  is  given  by 

Po  +  Pi  =  -1. 

Next  we  form  the  elements  p(,),  remembering  that  for  t  >  0,  p(,)  is  the  p,.  in 
which  occurs.  Then 

p(0)  =2,  p(1)  =  po,  p<2>  =  pi,  p(3)  =  pi,  p(4)  =  po. 

We  apply  Gauss’s  product  formula  to  compute  the  product  of  the  two  p’s  whose 
sum  we  have  identified.  The  formula  gives 

popi  =  p(1)p(2)  =  p<4)  +  p<3>  =  po  +  pi  =  — 1, 
the  second  equality  following  since  the  rule  for  the  indices  is  to  extract  a  power 
of  £  appearing  in  p ( 1 1  and  add  that  index  to  all  the  powers  of  f  appearing  in  p'2-1. 
Since  po  and  pi  have  sum  —  1  and  product  —  1,  they  are  the  roots  of  the  quadratic 
equation 

x2  +  x  —  1=0,  namely  ^(—1  ±  \/5). 

Deciding  which  root  is  po  and  which  is  pi  involves  looking  at  signs.  The  two 
roots  of  the  quadratic  equation  are  of  opposite  sign  because  the  constant  term  of 
the  quadratic  equation  is  negative.  Since  po  =  f  +  f-1  =  e2'T'/4  +  e~27T'^5  = 
2cos(27t/5)  is  positive,  we  obtain 

Po=j(-l  +  '/5)  and  pi  =  ^(-  1  — V5). 

The  computation  can  in  principle  stop  here,  since  knowing  cos(27t/5)  gives 
us  sin(27r/5)  and  therefore  <?2jr'/2\  See  Figure  9.3.  But  it  is  instructive  to  carry 
out  the  algorithm  anyway.  We  are  thus  to  treat  k  =  4.  The  periods  of  1  term  are 

$0  =  G  $1  =  £3,  $2  =  Z4,  $3  =  C2- 

The  corresponding  objects  with  superscripts  are 

£(0)  =  1,  ?(1)=£>,  ?(2)=£3,  $®=*1,  §(4)=^2. 

The  relevant  sums  of  pairs  are 

$o  +  §2  =  Po, 

$1  +  $3  =  Hi- 

We  again  use  Gauss’s  product  formula,  and  this  time  we  obtain 

=  $(»$«>  =*W=$<0>  =  1. 

Hence  $o  and  $2  are  the  roots  of  the  quadratic  equation 

,  ±  i^i^2 

y  ~  Voy  +1=0,  namely  - — - . 

The  root  y  involving  the  plus  sign  is  e2n'^5. 
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Example  2. 13  n  =  17,  N  =  2,  21'  =  16.  The  relevant  pairs  (k,  l)  have 
kl  =  16,  and  the  case  ( k ,  Z)  =  (1,  16)  is  trivial  since  the  only  subscripted  p  is 

E.:=of3'=-i- 

For  k  =  2,  the  2  periods  have  8  terms  each,  and 


ho 

^30+20  ^  ^30+2-1 

'20+2-2  -20+2-3 

+  r  +  ?3 

+ 

30+2-4  ^ 

^  30+2-5 

+ 

^  30+2-6 

+ 

^30+2.7 

+  C9  +  ?13 

+  ^!5 

+  ?16  +  c8 

+ 

r 

4  +  c2, 

,  a 

1+20  -3 1+2-1 

-a  1+2-2  -2 1+2-3 

Q 1+2-4 

^1+2-5 

-2  1+2-6 

31+2-7 

hi 

=  G 

+  ?3 

+  r 

+  r 

+ 

3  + 

+ 

+ 

=  ?3 

+ 

v/'Y 

O 

+ 

vrv 

L/l 

+  ?11 

+  t.u  +  t.i 

+ 

12  +  C6 

We  form  those  sums  of  pairs  of  p’s  that  yield  an  p  from  the  previous  step.  Here 
there  is  only  one  pair,  and  the  sum  is  given  by 


ho  +  hi  =  -1. 


Next  we  form  the  elements  p(,h  remembering  that  for  t  >  0,  tj'n  is  the  t]r  in 
which  occurs.  Then  tfi])  =  2, 

pa)  =  pW  =  p(13>  =  p<15>  =  r,V6)  =  ,00  =  pW  =  p(?)  =  ,0> 

,(3)  _  ,(10)  _  (5)  _  (11)  _  (14)  _  (7)  _  (12)  _  (6)  _ 


To  compute  popi  by  means  of  Gauss’s  product  formula,  we  use  ho  =  h ' 1 1  and 
pi  =  p(3).  Then 

pop,  =  pWpO)  =  pW  +  pd  1)  +  p(6)  +  p(  12)  +  pd5)  +  ,(8)  +  ,03)  +  ,(7)f 


the  indices  on  the  right  side  being  the  indices  for  p ,  plus  one.  Resubstituting  in 
terms  of  po  and  p, ,  we  obtain 


hohi  =  4  ho  +  4hi  =  -4. 

Therefore  po  and  p,  are  the  roots  of  the  quadratic  equation 

x2  +  x  —  4  =  0,  namely  4  (—1  ±  \/l7 ). 

Deciding  which  root  is  p o  and  which  is  p,  involves  looking  at  signs.  The  two 
roots  of  the  quadratic  equation  are  of  opposite  sign.  Since 

ho  =  (E  +  r1)  +  (E  +  r2)  +  (?4  +  r4)  +  «8  +  r8) 

=  2(  cos(27r/17)  +  cos(47r/17)  +  cos(87r/17)  +  cos(167r/17)) 

>2(j  +  2+  0+(—  1))  =  0, 

13The  discussion  of  this  example  closely  follows  that  in  Van  der  Waerden,  Vol.  I,  Section  54. 
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?7o  is  the  positive  root,  and  we  have 

ho  =  I(-l  +  yf7)  and  m  =  ±(-l  -  Vl7). 

For  A'  =  4,  the  4  periods  have  4  terms  each,  and 

£0  =  c30+4'°  +  £3°+41  +  £3°+4'2  +  <30+4°  =  £'  +  £13  +  £16  +  £4, 

o  1+4-0  o  1+4-1  o  1+4-2  ■j  1+4-3  o  c  ia  19 

£i  =  r  +r  +  r  +r  =  r  +  r  +  r4  +  r2, 

o  2+4-0  o  2+4-1  o2+4-2  o  2+4-3  q  ic  q  9 

£2  =  r  +r  +  r  +  r  =r  +  r5  +  r  +  r, 

03+4-0  o3+4-l  o3+4-2  o3+4-3  in  i  i  7  £ 

&  =  r  +  r  +  r  +  r  =  r°  +  r1  +  r+r. 

The  sums  of  pairs  of  these  that  yield  +s  are 

£0  +  £2  =  ho 
£1  +  £3  =  hi- 

We  can  read  off  superscripted  £’s  from  the  exponents  on  the  right  sides  of  the 
formulas  for  £0-  . . . ,  £3,  and  the  results  are 


fc(l)  _  ^ (13)  _  £(16)  _  £(4) 

= 

£0, 

£0)  _  £(5)  _  ^  (14)  _  £(12) 

= 

£l. 

t(9)  _  ^ (15)  _  £(8)  _  t(2) 

= 

£■(10)  _  £(11)  _  £(7)  _  £(6) 

= 

£3. 

Then  the  relevant  products  are 

£o£2 

=  §(D|(9) 

=  £(10)  _|_£d6)_|_£(9)  +£(3)  _ 

£3 

+  £0 

+  £2 

+  £1  —  —  1, 

£l£3 

_  £(3)t(6) 

_  £(13)  +  £(14)  +  £(10)  +  £(9)  _ 

£0 

+  £1 

+  £3 

+  £2  =  -l. 

Thus  £0  and  £2  are  the  roots  of  the  quadratic  equation 

y2  -  hoy  -1=0, 

while  £1  and  £3  are  the  roots  of  the  quadratic  equation 

y2  -  hiy  -1=0. 

Since  £o£2  and  ^£3  are  negative,  these  equations  each  have  roots  of  opposite 
sign.  We  observe  that  £0  =  2(cos(27r/17)  +  cos(87t/17))  >  0  and  that  £3  = 
2(cos(147r/17)  +  cos(127r/17))  <  0,  and  we  conclude  that  the  signs  are 

£0  >  0  and  £2  <  0, 

£1  >  0  and  £3  <  0. 
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Figure  9.4.  Construction  of  a  regular  17-gon.  The  small  circle  has  center  Q,  |) 
and  radius  Two  circles  are  drawn  tangent  to  it  with  center  (0,  0);  their  radii 
are  rjo/4  and  |?/i|/4.  Their  a  intercepts  and  height  \  determine  the  dashed  box. 
The  diameter  of  the  large  solid  semicircle  is  §o/2,  and  its  heavy  part  is  ).o/2. 
The  separate  semicircle  at  the  left  constructs  v^i/4  from  £i/2,  and  the  chord 
in  the  large  semicircle  is  at  distance  y^i/4  from  the  diameter. 


For  k  =  8,  the  8  periods  have  2  terms  each,  and  the  two  with  sum  §o  are 

3o+8.°  3°+81  .1  ,  >-16 

Ao  =  C  +  s  =  C  +  s  > 

34+8°  34+84  13  4 

M  =  k  +S  =S  +  5  • 

Their  sum  and  their  product  are  given  by 
A.o  +  A4  =  §o> 

a0a4  =  f14  +  k5  +  k12  +  k3  =  $1. 

Thus  Ao  and  A4  are  the  roots  of  the  quadratic  equation 

z2  -  §0 z  +  §i=0. 

Since  Ao  =  2  cos(27t/17)  >  2  cos(8tt/17)  =  A.4,  A0  is  the  larger  of  the  two  roots 
of  the  equation. 

In  summary,  we  have  successively  defined 

>7o  =  1+Vn)  and  m  =  sfin), 

§0  =  5(110  +  yj ill  +  4)  and  §2  =  5(170  -  y/%  +  4)> 

§1  =  +  \Ai  +4)  and  §3  =  5(171  -yjril+A), 

^0 = 5  (§0 + 0 — 4§i ) . 
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Since  /-o  =  2cos(27r/17),  these  formulas  explicitly  point  to  how  to  construct  a 
regular  17-gon.  See  Figure  9.4. 


13.  Solution  of  Certain  Polynomial 
Equations  with  Solvable  Galois  Group 

In  this  section  we  investigate  what  specific  information  can  be  deduced  about  a 
finite  Galois  extension  in  characteristic  0  when  the  Galois  group  is  solvable. 
The  tool  is  a  precursor  of  modern  harmonic  analysis14  known  as  “Lagrange 
resolvents.”  The  argument  of  the  previous  section  could  be  regarded  as  an  instance 
of  applying  the  theory  of  Lagrange  resolvents,  but  Lagrange  resolvents  give  only 
the  simpler  formulas  of  the  previous  section,  not  the  Gauss  product  formula. 

Proposition  9.47.  Let  K  be  a  finite  normal  extension  of  a  field  k  of  charac¬ 
teristic  0,  suppose  that  Gal(K/k)  is  cyclic  of  order  n  with  a  as  a  generator,  and 
suppose  that  Xn  —  1  splits  in  k.  Fix  a  generator  a  of  Gal(K/k)  and  a  primitive 
«th  root  co  of  1  in  k.  For  0  <  r  <  //,  define  k  linear  maps  Er  :  K  — >■  K  by 

Erx  =  n~x  ^  co~krakx  forx  e  K. 

k  mod  n 

Then 

(a)  E,  Es  equals  Es  if  r  =  s  and  equals  0  if  r  yk  s  mod  n ,  so  that  the  E,  ’s  are 
commuting  projection  operators  whose  images  are  linearly  independent, 

(b)  modn  Er  =  I,  so  that  the  direct  sum  of  the  images  of  the  Er’s  is 
all  of  K, 

(c)  cr(x)  =  of  x  for  all  r  and  for  all  x  in  image  Er. 

(d)  image  E$  =  k. 

Remarks.  The  integers  k  and  r  depend  only  on  their  values  modulo  n ,  and  the 
summation  indices  “k  mod  n”  and  “r  mod  «”  are  to  be  interpreted  accordingly. 
The  operators  Er  are  known  classically  as  Lagrange  resolvents,  apart  from 
the  constant  n~x .  The  proposition  says  that  the  k  linear  map  a  has  a  basis  of 
eigenvectors,  that  the  eigenvalues  are  a  subset  of  the  powers  of,  and  that  each  Er 
is  the  projection  operator  on  the  eigenspace  for  the  eigenvalue  co1'  along  the  sum 
of  the  remaining  eigenspaces. 

14Lagrange  resolvents  give  a  certain  specific  Fourier  decomposition  relative  to  a  cyclic  group. 
Similar  formulas  apply  whenever  a  cyclic  group  acts  linearly  on  a  vector  space  over  Ik  and  the  relevant 
roots  of  1  lie  in  k.  For  the  corresponding  decomposition  of  a  vector  space  over  C  when  a  finite  group 
G  acts  linearly,  see  Problems  47-52  at  the  end  of  Chapter  VII.  The  decomposition  in  those  problems 
can  be  seen  to  work  for  any  field  k  of  characteristic  0  for  which  the  values  of  all  irreducible  characters 
of  G  lie  in  k.  The  values  of  the  characters  are  sums  of  certain  roots  of  1,  and  thus  it  is  enough  that 
k  contain  a  certain  finite  set  of  roots  of  1 . 
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Proof.  For  x  in  K,  we  compute 


-2  £ 

co~kr ctk  (  £  co~lsalx) 

k  mod  n 

l  mod  n 

~2  £ 

£  CO~kr  <Jk  C0~ms+ks  <7m~k . 

k  mod  n 

m  mod  n 

“2  £ 

(  £  cok(-s-r))ci>-ms(jmx. 

m  mod  n 

k  mod  n 

The  expression  in  parentheses  on  the  right  side  is  the  sum  of  a  finite  geometric 
series.  If  s  =  r  mod  n.  then  every  term  in  the  sum  is  1,  and  the  sum  is  n.  If 
s  ^  r  mod  n,  then  the  sum  is  =  0.  Thus  (a)  follows. 

Next  we  calculate 

£  Erx  =  £  n~l  £  (o~k'(Tkx=  £  n~l{  £  «)~kr)akx. 

r  mod  n  r  mod  n  k  mod  n  k  mod  n  r  mod  n 

As  in  the  previous  paragraph,  the  sum  in  parentheses  is  n  if  k  =  0  and  it  is  0  if 
k  yk  Q  mod  n.  Therefore  only  the  k  =  0  term  on  the  right  side  contributes,  and 
the  right  side  simplifies  to  x.  This  proves  (b). 

The  computation 

a(Erx)  =  n~ 1  £  oo~krcrk+lx 

k  mod  n 

=  n~ 1  £  «(-,+1)Vx 

/  mod  n 

=  (Orn~l  £  CD~lrOlX  =  oJErx 

l  mod  n 

shows  that  cs(y  )  =  of  y  for  every  y  of  the  form  Erx,  and  these  y  ’s  are  the  members 
of  the  image  of  Er.  This  proves  (c). 

Combining  (b)  and  (c),  we  see  that  rr  (x)  =  x  if  and  only  if  x  is  in  image  Eq. 
Since  Gal(K/k)  is  cyclic,  the  members  of  K  fixed  by  a  are  the  members  fixed 
by  the  Galois  group,  and  these  are  the  members  of  k  by  Proposition  9.35d.  This 
proves  (d).  □ 

Corollary  9.48.  Let  K  be  a  finite  normal  extension  of  a  field  k  of  characteris¬ 
tic  0,  suppose  that  Gal(K/k)  is  cyclic  of  prime  order  and  suppose  that  Xp  —  1 
splits  in  k.  Then  there  exist  a  in  k  and  x  in  K  such  that  xp  =  a  and  K  =  k(x). 

Remarks.  In  other  words,  a  finite  normal  extension  held  in  characteristic  0 
with  Galois  group  cyclic  of  prime  order  p  is  necessarily  obtained  by  adjoining  a 
p{h  root  of  some  element  of  the  base  held,  provided  that  the  base  held  contains 
all  the  pth  roots  of  1 .  Once  the  extension  held  contains  one  pth  root  of  an  element 
of  the  base  held,  it  has  to  contain  all  pth  roots,  since  the  base  held  by  assumption 
contains  a  full  complement  of  pth  roots  of  1 . 
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Proof.  We  apply  Proposition  9.47  with  n  =  p.  Since  [K  :  k]  =  p  >  1,  (d) 
shows  that  Eq  is  not  the  identity.  By  (b),  some  Er  with  r  =k  0  is  not  the  0  operator. 
Let  x  be  a  nonzero  element  in  image  Er .  Since  the  generator  a  of  the  Galois  group 
is  a  field  automorphism,  a{xp)  =  cr(x)p  =  (corx)p  =  a>rpxp  =  xp.  Since  xp  is 
fixed  by  the  Galois  group,  xp  lies  in  k.  Then  the  element  a  =  xp  has  the  property 
that  xp  =  a  and  K  ^  k(x)  2  k.  Since  [K  :  k]  is  prime,  Corollary  9.7  shows  that 
there  are  no  intermediate  fields  between  K  and  K.  Therefore  K  =  k(x).  □ 

We  shall  apply  Corollary  9.48  to  prove  the  converse  statement  in  Theorem 
9.44— that  solvability  of  the  Galois  group  for  a  polynomial  equation  in  charac¬ 
teristic  0  implies  that  the  solutions  of  the  equation  are  expressible  in  terms  of 
radicals  and  the  base  field.  We  begin  with  a  lemma  that  handles  a  special  case. 

Lemma  9.49.  Let  k  be  a  held  of  characteristic  0,  let  n  >  0  be  an  integer, 
and  let  K  be  a  splitting  held  for  j  (Xr  —  1)  over  k.  Then  K/k  is  a  Galois 
extension,  the  Galois  group  of  Gal  (K/k)  is  abelian,  and  K  has  a  root  tower  over  k. 

PROOF.  Being  a  splitting  held  in  characteristic  0,  K  is  a  hnite  Galois  extension 
of  k.  For  1  <  r  <  n,  let  cor  be  a  primitive  rth  root  of  1  in  K.  The  primitive 
rth  roots  of  1  are  parametrized  by  the  group  (Z/rZ)x  once  some  cor  is  specihed, 
the  parametrization  being  k  i->  o>k .  If  a  is  in  Gal(K/k),  then  a  ( o>r )  =  o>k  for 
some  such  k.  This  correspondence  respects  multiplication  in  (Z/rZ)x  since  if 
cr(oy)  =  a>k  and  a'{u>r)  =  mlr,  then  o'{o(oor))  =  (y\mk)  =  a'{oor)k  =  ojkl. 
Thus  for  each  r,  we  have  a  homomorphism  of  Gal(K/k)  into  the  abelian  group 
(Z/rZ)x.  Putting  these  homomorphisms  together  as  r  varies  and  using  the  fact 
that  the  cor ’s  generate  K  over  k,  we  obtain  a  one-one  homomorphism  of  Gal  (K/k) 
into  the  abelian  group  j  (Z/rZ) x .  Consequently  Gal(K/k)  is  isomorphic  to 
a  subgroup  of  an  abelian  group  and  is  abelian. 

It  follows  from  Corollary  9.39  that  every  extension  of  intermediate  helds  is 
Galois  and  has  abelian  Galois  group.  For  1  <  r  <  n,  we  introduce  the  interme¬ 
diate  field  Kr  =  k(cui,  a>2, . . . ,  cor).  Here  Ki  =  k(l)  =  k.  For  1  <  r  <  n,  Kr  is 
generated  as  a  vector  space  over  K,-_ i  by  a>r,  <u;2, . . . ,  corr~l  since  =  0 

for  r  >  1,  and  thus  [K,-  :  Kr_i]  <  r  —  1.  Since  Gal(K,./K,_i)  is  abelian,  it  has 
a  composition  series  whose  consecutive  quotients  are  cyclic  of  prime  order,  the 
prime  order  necessarily  being  <  [K,.  :  K,  _i]  <  r  —  1.  Applying  Galois  theory, 
form  the  chain  of  intermediate  extensions  between  K,._i  and  Kr.  The  degree  of 
each  extension  is  some  prime  p  with  p  <  r  —  1 ,  the  prime  depending  on  the  two 
fields  in  the  chain.  The  pth  roots  of  unity  are  in  the  smaller  of  any  two  consecutive 
fields  because  they  are  in  K,  _].  By  Corollary  9.48,  such  a  degree-/?  extension 
between  K,— i  and  Kr  is  generated  by  the  smaller  field  and  the  plh  root  of  an 
element  in  the  smaller  field.  Since  Ki  =  k,  we  see  inductively  that  K,-  has  a  root 
tower  over  Kr_i  for  each  r.  Since  K  =  K„,  K  has  a  root  tower  over  k.  □ 
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Proof  of  sufficiency  in  Theorem  9.44  that  Gal(K/k)  be  solvable.  Let 
F(X)  be  in  k[X],  and  suppose  that  K  is  a  splitting  field  of  F(X)  over  k.  Under 
the  assumption  that  Gal(K/k)  is  solvable,  we  are  to  prove  that  there  exists  a  finite 
extension  K'  of  K  having  a  root  tower. 

Since  G  =  Gal(K/k)  is  solvable,  we  can  find  a  finite  sequence  of  subgroups 
of  G,  each  normal  in  the  next  larger  one,  such  that  the  quotient  of  any  consecutive 
pair  is  cyclic  of  prime  order.  We  write 

G  =  H0  2  ^  2  •  -  2  tf*_i  2  Hk  =  {1} 
with  H j /Hj+\  cyclic  of  prime  order  pj  for  0  <  j  <  k.  Let 
k  =  I0  c  K[  c  ■  ■  ■  c  K*_!  cKj=K 

be  the  corresponding  sequence  of  intermediate  fields  given  by  the  Fundamen¬ 
tal  Theorem  of  Galois  Theory  (Theorem  9.38).  Here  Ky  =  ¥Hi ,  and  Flj  = 
Gal(K/Ky). 

According  to  Corollary  9.39,  Ky+i  is  a  normal  extension  of  K y  if  and  only  if 
Gal(K/Ky+i)  is  a  normal  subgroup  of  Gal(K/Ky),  and  in  this  case  we  have  a 
group  isomorphism  Gal (K/K,  )/ Gal(K/Ky+i)  =  Gal(Ky+i/Ky).  Since  //;+|  is 
a  normal  subgroup  of  Hj  with  quotient  cyclic  of  order  pj ,  it  follows  that  Ky+i  /Ky 
is  indeed  normal  and  the  Galois  group  is  cyclic  of  order  pj. 

Let  us  use  Theorem  9.22  to  regard  K  as  lying  in  a  fixed  algebraic  closure  K  . 
Let  n  be  the  product  of  all  the  primes  pj,  and  let  ¥Jf)  be  the  splitting  field  over 
k  for  ]~[”_i  ( Xr  —  1)  within  K  .  For  1  <  j  <  k,  let  K'  be  the  subfield  of  K 
generated  by  K y  and  Kq.  We  define  K'  =  K/.  Then  we  have 
k  C  Kj,  c  k;  c  •  •  •  c  K'_!  c  K'k  =  K\ 

Lemma  9.49  shows  that  ¥J(j  has  a  root  tower  over  ¥.'.  To  complete  the  proof,  it  is 
enough  to  show  for  each  j  >  0  that  either  K'.+1  =  K '•  or  else  [K'.+1  :  KG  =  pk 
and  K'-+|  is  generated  by  K'  and  the  p'h  root  of  some  member  of  K'-. 

For  each  j  >  0,  suppose  that  K/+i  =  K j(xj).  Let  Fj(X)  be  the  minimal  poly¬ 
nomial  of  Xj  overKy.  Since  Ky+i/Ky  is  normal,  Ky+i  is  the  splitting  field  of  Fj{X) 
over  Ky.  Then  K'+1  =  K'  (xy)  is  the  splitting  field  of  Fj(X )  j  (Xr  —  1)  over 

K'. ,  and  consequently  K'+1/K'  is  a  normal  extension.  If  g  is  in  Gal(K'.+1/K'.), 
then  g  sends  xj  into  a  root  of  Fj  ( X )  and  is  determined  by  this  root.  The  restriction 
g|K  |  therefore  carries  Ky+i  into  itself  and  is  in  Gal(Ky+i/Ky).  Since  g  is 

determined  by  g(xj),  the  group  homomorphism  g  i->-  g|K  is  one-one.  The 
image  of  this  homomorphism  must  be  a  subgroup  of  Gal  (Ky+i/Ky)  and  therefore 
must  be  trivial  or  have  pj  elements.  In  the  first  case,  K'.+  |  =  K'.,  and  in  the 
second  case,  [K'  +  1  :  KG  =  pj.  In  the  latter  case,  K'  contains  all  pj  of  the  //h 
roots  of  1  since  these  roots  of  1  are  in  Kq;  by  Corollary  9.48,  K'+1  is  generated 
by  K'.  and  a  p^  root  of  some  member  of  K'-.  This  completes  the  proof.  □ 
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We  turn  now  to  apply  our  methods  to  irreducible  cubics  over  a  field  k  of  char¬ 
acteristic  0.  In  effect  we  shall  derive  Cardan’s  formula,15  which  was  mentioned 
at  the  beginning  of  Section  1 1 . 

The  Galois  group  of  a  splitting  field  of  a  cubic  polynomial  has  to  be  a  subgroup 
of  the  symmetric  group  63,  and  irreducibility  of  the  cubic  implies  that  the  Galois 
group  has  to  contain  a  3-cycle.  Therefore  the  Galois  group  has  to  be  either  63  or 
the  alternating  group  2I3  =  C3. 

Let  the  cubic  be  X 3  +a2X2  +a\  X +«o,  the  coefficients  being  in  k.  Substituting 
X  =  Z  —  \a2  converts  the  polynomial  into 

(Z  -  5GI2)3  +  a2(Z  -  \a2)2  +  ai(Z  -  ^a2)  +  «o 

=  Z3  +  (a  1  —  \al)Z  +  (n0  —  ^a\a2  + 

and  therefore  we  can  assume  whenever  convenient  that  the  given  polynomial  has 
«2  =  0. 

Suppose  for  the  moment  that  the  Galois  group  is  G  =  ©3.  A  composition 
series  is 

G  =  63  2  2l3  2  {1}, 

and  we  can  write  the  corresponding  sequence  of  fixed  fields  as 

k  c  L  c  K, 

where  K  is  the  splitting  field  and  L  is  K2ts .  The  dimensions  satisfy  [L  :  k]  =  2 
and  [K  :  L]  =  3. 

Let  the  roots  in  K  of  the  given  cubic  be  r  \ .  r2 ,  r2 .  Since  G  is  solvable.  Theorem 
9.44  tells  us  that  the  roots  are  expressible  in  terms  of  radicals  and  members  of 
k.  To  derive  explicit  formulas  for  the  roots,  the  idea  is  to  use  a  two-step  process 
with  Lagrange  resolvents,  arguing  as  in  the  proof  of  Corollary  9.48  at  each  step. 

The  first  step  involves  passing  from  k  to  L.  The  square  roots  of  1  are  already 
in  k,  and  L  is  to  be  obtained  from  k  by  adjoining  one  of  the  square  roots  of 
some  element  of  k.  In  Proposition  9.47  the  Galois  group  Gal(L/k)  is  a  2-element 
quotient  group,  the  sum  is  over  members  of  the  quotient  group,  and  the  element  x 
is  in  L.  It  is  a  little  more  convenient  to  pull  the  sum  back  to  one  over  the  6-element 
symmetric  group,  taking  co  to  be  the  sign  function  on  63  and  taking  x  to  be  any 
element  of  K.  The  formulas  for  the  projection  operators  Eq  and  £j  are  then 

E0x  =  \  E 

a€©  3 

E\X  =  \  E  (Sgnor)cr(x), 

(TS©3 

15We  discuss  only  Cardan's  cubic  formula,  omitting  any  discussion  of  the  corresponding  quartic 
formula,  which  often  bears  Cardan’s  name  and  which  can  be  handled  with  the  same  techniques.  See 
Van  der  Waerden,  Vol.  I,  Section  58,  for  details. 
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with  .v  in  K,  and  the  proof  of  Corollary  9.48  tells  us  to  adjoin  to  k  the  square  root 
of  any  element  of  image  £j,  i.e.,  any  element  with  a  (x )  =  (sgnx)x  for  all  a  in 
©3- 

The  only  elements  of  K  for  which  we  have  good  control  of  the  action  of  the 
Galois  group,  apart  from  the  elements  of  k,  are  the  elements  that  are  expressed 
directly  in  terms  of  the  roots  ri,  r2,  ry  of  the  polynomial.  By  renumbering  the 
roots  if  necessary,  we  may  assume  that  the  roots  are  permuted  by  63  according  to 
their  subscripts.  An  example  of  a  polynomial  function  of  r\ ,  r2,  ry  that  transforms 
according  to  the  sign  of  the  permutation  played  a  role  in  Section  1.4  in  defining 
the  sign  of  a  permutation.  It  is  the  difference  product  of  the  polynomial,  namely 

11  (rj-n). 

This  is  a  square  root  of  the  discriminant  D  of  the  polynomial,  which  is  given  by 

°=  n  (g-c)2. 

l</<;'<3 

We  shall  compute  D  in  terms  of  the  coefficients  of  the  cubic  shortly.  In  the 
meantime,  the  proof  of  Corollary  9.48  thus  tells  us  that  L  =  k(>/D ).  Here  \J~D 
is  given  by 

sJ~D  =  (r  3  -  r2)(r3  -  rx)(r2  -  rj 

=  (nrf  +  r2r2  +  ryr\ )  -  ( r\r2  +  r\ry  +  rfri). 

The  second  step  is  to  pass  from  L  to  K.  Corollary  9.48  says  to  expect  IK 
to  be  obtained  by  adjoining  the  cube  root  of  something  if  the  cube  roots  of  1 
are  already  present  in  L.  The  proof  of  the  second  half  of  Theorem  9.44,  which 
follows  Corollary  9.48,  indicates  how  we  can  incorporate  the  cube  roots  of  1  into 
the  fields  in  order  to  have  a  root  tower.  What  we  can  do  is  to  replace  k  at  the  start 
by  a  splitting  held  for  n  1  <,-<3  (Xr  —  1).  Since  ±1  are  already  in  k,  we  are  to 
adjoin  the  nontrivial  cube  roots  of  1,  i.e.,  the  roots  of  X2  +  X  +  1,  if  they  are  not 
already  present.  In  other  words,  what  we  do  is  replace  k  at  the  start  by  k(s/— 3 ). 
Changing  notation,  we  assume  that  v7— 3  lies  in  k  from  the  outset. 

We  can  now  use  Lagrange  resolvents.  Let  a  be  the  generator  (1  2  3)  of  213, 
sending  ri  to  r2,  r2  to  ry,  and  ry  to  r\.  Let  o>  =  1  (—  I  +  *J— 3 )  be  a  primitive 
cube  root  of  1 .  Then  we  have 

EqX  =  ^(x  +  crx  +  a2x), 

E\x  =  |(x  +  co~xax  +  ocT2o2x), 

EyX  =  |(jc  +  co~2ax  +  co~xcr2x). 
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Again  we  can  use  any  x,  but  the  roots  of  the  cubic  are  the  simplest  nontrivial 
elements  for  which  we  know  the  action  of  a.  Corollary  9.48  shows  that  K  = 
L(£i^)  if  E\x  ^  0.  Proposition  9.47  says  that  (£jx)3  is  fixed  by  a ,  and  it 
therefore  lies  in  L.  Hence  IK  is  identified  as  obtained  from  L  by  adjoining  a  cube 
root  of  the  element  (£j.r)3  of  L. 

Taking x  =  r\,  we  have  ax  =  r2  and  a2x  =  r3.  Also,  co±l  =  ^(—  1  ±  V— 3 ). 
Using  the  formula  for  £jx  and  substituting  for  \J~D  and  oo±l  then  gives 

(3£jn)3  =  r\  +  r\  +  r\  +  6rir2r3 

+  3  ftj_1(rp-2  +  r\r-i  +  rfri)  +  3&>(rirf  +  r2r\  +  r3r\) 

=  E  rf  +  6 nr2r3  -  \  Y.  rfrj  +  Is/^sfD- 
i  i¥=j 

To  proceed  further,  we  shall  want  to  substitute  expressions  involving  the  co¬ 
efficients  of  the  cubic  for  the  above  symmetric  expressions  in  the  roots.16  These 
expressions  will  be  considerably  simplified  if  we  assume  that  the  coefficient  of 
X2  in  the  cubic  is  0.  We  know  that  this  assumption  involves  no  loss  of  generality. 
Thus  we  assume  for  the  remainder  of  this  section  that  the  cubic  is  X3  +  pX  +  q. 
The  relevant  formulas  relating  the  roots  and  the  coefficients  are 


r\  +  r2  +  r3  =  0, 
r\r2  +  r\r3  +  r2r3  =  p , 
r\r2r3  =  -q 


Aiming  for  the  right  side  of  the  displayed  formula  for  (3£jri),  we  have 


0  =  (n  +  r2  +  r3)3  =  Y.  rf  +  3  J2  rfrl  +  6nr2r3, 

‘  i¥=j 

0  =  (r i  +  r2  +  r3)(rir2  +  rir3  +  r2r3)  =  Y.  rfrj  ~  yrir2^3, 

i¥=j 


-y?  =  y/hr2r3. 

Addition  of  these  three  lines  and  comparison  with  the  expression  for  3(£jn)3 
yields 


-y<?  =  -  |  E  rfri  +  6r\r2ri  =  (3£in)3  -  2^3s/D. 

i  ' 


Consequently 

(3Eiri)3  =  — yg  +  |a/— 3VZ). 

l6Problems  36-39  at  the  end  of  Chapter  VIII  assure  us  that  this  rewriting  is  possible.  For  our 
derivation  this  assurance  is  not  logically  necessary,  since  we  will  be  producing  explicit  formulas. 
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Similarly 

(3E2nf  =  -f  q  -  |V=3>/Z>. 

Since  3£bri  =  r\  +  r2  +  r3  =  0,  we  have  expressions  for  E^ri,  E\r\,  and  E2r 
apart  from  the  choices  of  the  cube  roots.  Proposition  9.47b  says  that  we  recover 
r\  by  addition:  r\  =  Eqp\  +  E\r\  +  E2r\.  Thus  we  have  found  a  root  explicitly 
as  soon  as  we  sort  out  the  ambiguity  in  the  choices  of  cube  roots  and  determine 
the  value  of  D  in  terms  of  the  coefficients  p  and  q. 

Theorem  9.50  (Cardan’s  formula).  Let  k  be  a  field  of  characteristic  0  con¬ 
taining  s/— 3,  and  let  X 3  +  pX  +  q  be  an  irreducible  cubic  in  k[X].  For  this 
polynomial  the  discriminant  D  is  given  by 

D  =  —4 p3  -  21  q2. 

The  Galois  group  of  a  splitting  field  of  the  cubic  is  ©3  if  D  is  a  nonsquare  in  k 
and  is  21?  if  I)  is  a  square  in  k.  In  either  case,  fix  a  square  root  of  D,  denote  it  by 
sJ~D ,  and  let  oo± !  =  3 (—  I  ±  be  the  primitive  cube  roots  of  1.  Then  it  is 

possible  to  determine  cube  roots  of  the  form 

3 £,n  =  q  +  IsT^s/d  and  3E2rx  =  q  - 

in  such  a  way  that  their  product  is  (3£’iri)(3iiir2)  =  —3 p,  and  in  this  case  the 
three  roots  of  X 3  +  pX  +  q  are  given  by 


H  =  E\r\  +  E2r\, 
r2  =  coE\r\  +  or  E2r \ , 
r3  =  or  E\r  1  +  toE2r\ . 


PROOF.  Define  ny  =  r\  +  r)’  +  for  1  <  k  <  4.  By  inspection  we  have 

/  1  1  1  \  / 1  rt  ri\  /  3  o-i  <r2 \ 

I  r2  r3  I  I  1  r2  r\  I  =  I  o\  a2  cr3  I  . 

\>f  r\  r\  /  \  1  r3  r2 )  \a2  ct3  cr4/ 


Taking  the  determinant  of  both  sides  and  applying  Corollary  5.3,  we  obtain 


/3 

CTl 

<^2  \ 

D  =  det  1  g\ 

(72 

cr3  I  =  3(72(74 

\a2 

0-3 

a4  / 

The  given  cubic  shows  that  crj  =  r\  +  r2  +  r3  =  0.  For  the  other  a,  ’s,  we  have 
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02  =  r\  +  r\  +  r\  =  (n  +  r2  +  r3)2  -  2 (rxr2  +  nr3  +  r2r3)  =  -2p, 

cr3  =  r\  +  r\  +  r|  =  (n  +  r2  +  r3)(r2  +  rf  +  rf) 

-  (rfr2  +  rfr 3  +  r\rx  +  rfr3  +  /'fr1r|r2) 

=  —  (n  +  r2  +  r3)(rir2  +  r3r3  +  r2r3)  +  3r,r2r3  =  -3g, 

4  1  4  1  4  /  2  1  2  1  2\2  o/  2  2  ■  2  2  ■  2  2\ 

cr4  =  r1  +  r2  +  r3  =  (rx  +  r2  +  r3 )  -  2(rx r2  +  rl r3  +  r2r3 ) 

=  (-2p)2  -  2(r jr2  +  r!r3  +  r2r3)2 

+  4rir2r3(ri  +  r2  +  r3)  =  (-2p)2  -  2(p)2  =  2/>2. 


Substituting,  we  obtain  D  =  —12 p3  +  8p3  —  21q2  =  — 4p3  —  21q2.  This  proves 
the  formula  for  D.  In  particular,  it  confirms  that  D  lies  in  k. 

The  Galois  group  of  the  splitting  field  of  the  polynomial  must  be  63  or  2t3.  If 
it  is  S3,  then  we  saw  above  that  L  =  k (V~D)  and  that  [L  :  k]  =  2.  Hence  D  is  a 
nonsquare  in  k.  If  the  Galois  group  is  2l3,  then  (r3  —  r2)(r3  —  n)(r2  —  rf)  is  fixed 
by  the  Galois  group  and  lies  in  k.  The  square  of  this  element  is  D,  and  hence  D 
is  a  square  in  k. 

With  either  Galois  group  the  calculations  with  the  cubic  extension  that  precede 
the  statement  of  the  theorem  are  valid.  If  r\  is  one  of  the  roots,  then  we  know  that 


r\  =  Eon  +  Exrx  +  E2n  =  E\>'  1  +  E2n, 

(3£,n)3  = 

(3E2n)3  =  -%q  - 

The  uniqueness  of  simple  extensions  (Theorem  9.1 1)  says  that  we  can  make  any 
choice  of  cube  root  to  determine  3E\r\ .  Then 

{3E\r\)(3E2r\)  =  (r  |  +  or  \  +  co~2o2ri)(ri  +  co~2or\  +  u>~{o2r\) 

=  (r  |  +  a>~1r2  +  <wr3)(r !  +  cor2  +  cu_1r3) 

=  (r2  +  r\  +  rj)  +  (co  +  cu_1)(rir2  +  nr3  +  r2r3) 

=  (rf  +  r|  +  r|)  -  +  r2r3). 

The  first  term  on  the  right  side  we  calculated  in  the  first  paragraph  of  the  proof 
as  o2  =  —2 p,  and  the  second  term  gives  —p.  Thus  {3E\r\){3E2r\)  =  —3/;  as 
asserted.  Since  a  operates  on  image  E\  as  multiplication  by  co  and  on  image  E2 
as  multiplication  by  co 2,  the  fact  that  r\  =  E \  r  i  +  E2r\  implies  that 

r2  =  o  (r  i )  =coElri+oo2E2ri 
and  ri,  =  o2{r\)  =  co2  E\r\  +  coE2r\. 


This  completes  the  proof. 


□ 
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14.  Proof  That  ir  Is  Transcendental 

In  this  section  and  the  next  three,  we  combine  Galois  theory  with  some  of  the 
ring  theory  in  the  second  half  of  Chapter  VIII.  This  combination  will  allow  us  to 
prove  some  striking  theorems,  see  how  Galois  groups  can  be  used  effectively  in 
practice,  and  develop  some  techniques  for  identifying  Galois  groups  explicitly. 

The  present  section  is  devoted  to  the  proof  of  the  following  theorem. 

Theorem  9.51  (Lindemann,  1882).  The  number  n  is  transcendental  over  Q. 

The  argument  we  give  is  based  on  that  in  a  book  by  L.  K.  Hua.17  For  purposes 
of  having  a  precise  theorem,  n  is  defined  as  the  least  positive  real  number  such 
that  eni  =  —  1.  In  addition  to  Galois  theory  in  the  form  of  Proposition  9.35, 
the  proof  here  will  make  use  of  a  few  facts  about  algebraic  integers.  Algebraic 
integers  were  defined  in  Section  VIII.  1  and  again  in  Section  VIII. 9  (as  well  as  in 
Section  VII. 4)  as  complex  numbers  that  are  roots  of  monic  polynomials  in  Z[X\. 
The  algebraic  integers  form  a  ring  by  Corollary  8.38  (or  alternatively  by  Lemma 
7.30),  the  only  algebraic  integers  in  Q  are  the  members  of  Z  by  Proposition  8.41 
(or  alternatively  by  Lemma  7.30),  and  any  algebraic  number  x  has  the  property 
that  nx  is  an  algebraic  integer  for  some  integer  n  ^  0  by  Proposition  8.42. 

We  begin  with  a  lemma. 

Lemma  9.52.  Let  f(X )  in  C[X]  be  given  by  f{X)  =  Ylk=o  akXk,  and  define 
F(X)  to  be  the  sum  of  the  derivatives  of  f(X): 

F(X)  =  £f«HX). 

1=0 


If  Q(z )  is  defined  as  Q(z)  =  F(0)ez  —  F(z)  for  z  €  C,  then  F( 0)  =  J2'l= o  akk]- 
and 

\Q{z)\<e^  t  \ak\\z\k. 

k= 0 

PROOF.  We  calculate  directly  that 


F(z)  =  £  £ 


a^kl 


-k—l 


/to  *=/  (k-iy. 


=  E  E 


k\ 


To  i=o  (k  —  I )\ 


n  k  b  | 

zk~l  = 

k= 0  /=0  1 ! 


17 Introduction  to  Number  Theory,  pp.  484—488.  In  the  same  pages  Hua  establishes  the  earlier 
theorem  of  Hermite  that  e  is  transcendental,  using  a  related  but  simpler  argument. 
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Evaluation  at  z  =  0  gives  F( 0)  =  EEo  ak  k  ■  ■  Then 


\Q(z)\  < 


n  oo  b  I  n  k  h  | 

£«*E 

fe=0  /=0  1 1  &=0  /=0  1 1 


«  oo  | 

E«*  E  77  zl 

k= 0  /=*+!  *  ! 


00  izi' 


<  E  l«*l  E  ,, 

£=0  /=*+!  v*  —  ^ 


since 


G)  * 1 


oo  I  7|™ 


=  E  k**lkr  E  — r 


£=0 


m=l 


ml 


<  elzl  \ak\\z\k ■  □ 

k=0 

Proof  of  Theorem  9.51.  Arguing  by  contradiction,  suppose  that  n  is  al¬ 
gebraic  over  Q,  so  that  a  =  ni  is  algebraic  over  Q  as  well.  Let  M(X)  be  the 
minimal  polynomial  of  a  over  Q,  and  let  K  be  the  splitting  field  of  M ( X)  in  C. 
This  exists  since  C  is  algebraically  closed.  We  write  a\,  . . . ,  am  for  the  roots  of 
M{X)  in  K,  with  a |  =  a.  These  are  distinct  algebraic  numbers,  and  they  are 
permuted  by  the  Galois  group,  G  =  Gal(K/Q).  What  we  shall  show  is  that 

m 

R=  11  (l  +  e“G/0. 

i=  i 

This  will  be  a  contradiction  since  1  +  e01'  =  0  for  a  \  =  in. 

We  expand  the  product  defining  R ,  obtaining 

R  =  1  +  £  eaJ  +  J2  eai+ak  +  •  •  •  , 
j  j,k 

Whenever  one  of  the  exponentials  has  total  exponent  0,  we  lump  that  term  with 
the  constant  1 .  Otherwise  we  write  the  term  as  e^1 ,  allowing  repetitions  among 
terms  e^‘ .  Thus 

R  =  n +  e^+eh  +  ---+ePr, 

with  N  an  integer  >  1,  with  each  /!/  ^  0,  and  with  N  +  r  =  2m. 

Each  member  of  G  =  Gal(K/Q)  permutes  on, ... ,  am,  and  it  therefore  per¬ 
mutes  the  /Vs  that  are  single  ay’s,  permutes  the  /Vs  that  are  the  nonzero  sums  of 
two  ay’s,  permutes  the  /V  s  that  are  the  nonzero  sums  of  three  ay’s,  and  so  on. 

Choose  an  integer  a  >  0  such  that  ao/\ , . . . ,  aam  are  algebraic  integers,  let  p 
be  a  prime  number  large  enough  to  satisfy  some  conditions  to  be  specified  shortly, 
and  define 

(aX)p~l  r  " 

fix )  =  n  (aX  -  aP,)P  =  E  akX\ 

VP  —  O.  /=  1  /s-=o 
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The  members  a  of  G  act  on  f(X )  as  usual  by  acting  on  the  coefficients.  Each  If 
that  is  the  nonzero  sum  of  a  certain  number  of  a/s  is  sent  into  another  fa’  of  the 
same  kind,  and  thus  a  just  permutes  the  factors  of  the  product  defining  /,  leaving 
f(X)  unchanged.  The  coefficients  of  ( p  —  1  )\f{a~lX)  are  algebraic  integers  in 
K.  Being  fixed  by  G,  they  are  in  Q  by  Proposition  9.35d,  and  hence  they  are  in 
Z.  Therefore 

_  Ap-\ap~lXp~l  +  ApapXp  + 
fi  (P-1)! 

with  Ap-i,  Ap, . . .  inZ.  Since  Ap_i  =  r~[/=i(— a/^)P>  we  can  arrange  that  p  does 
not  divide  Ap-\ap~l  by  choosing  p  greater  than  a  and  greater  than  |  nU(aft)|- 
If  we  look  at  the  /th  factor  in  the  product  defining  /(X),  we  see  that  {X  —  fay 
divides  f(X)  in  K[X],  Therefore  we  have  further  formulas  for  f(X),  namely 

Yp,i(X  -  fa)p  +  yp+u(X  -  fa)p+l  +  ■■■ 


f(X)  = 

(P-D! 

As  in  Lemma  9.52,  we  define 


for  1  <  I  <  r. 


F(X)  =  £  f«HX) 
1=0 


and 


Q(z)  =  F( 0)e*  -F(z). 


Then  we  have  F( 0)  =  ^"=0  cpkl.  For  1  <  l  <  r,  the  dehnition  of  Q(z)  gives 
F( 0)e^'  =  F(fa)  +  Q(fa).  Substituting  from  the  definition  of  R.  we  obtain 

F(0)R  =  F(0)(N  +  t  ePl)  =  NF{ 0)  +  £  F(fa)  +  £  Q(fa).  (*) 

i=i  i=\  i=i 

A  further  condition  that  we  impose  on  the  size  of  p  is  that  p  >  N.  Then  the 
computation 


NF( 0)  =  N  £  akk\  =  N{Ap-Xap~l  +  PApap  +  p(p  +  1  )Ap+lap+1  +  •  •  • ) 

k= 0 

and  the  properties  of  Ap-\ .  Ap.  . . .  together  imply  that  NF( 0)  is  an  integer  and 
is  not  divisible  by  p. 

Let  us  compute  F(fa).  The  derivatives  through  order  p  —  1  of  f(X)  are  0  at 
fa .  For  the  pth  derivative  we  have 

PYpJ  =  f(p)m  =  pApap  +  £  iP  +  J) '  "  (7  +  1}  Ap+jaP+j  . 

j>  1  VP  —  !)■ 

The  coefficient  of  Ap+jCip+’  inside  the  sum  equals 

(p  +  j)---(j  +  lX/!p  _  J P  +  j 
pip  —  l)!j !  ~P\  j 
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and  thus 


PYp.i  =  f(p\Pi )  =  ap(pAp  +  E  p{PV)ap+MPi)j)- 

j>  i 

The  higher-order  derivatives  are  computed  and  simplified  similarly.  For  the 
( p  +  k)th  derivative  with  k  >  1,  we  find  that 

(P  +  k)  •  •  •  (p  +  1)^+*,,  =  f(p+k\fii) 

=  ap+k((p  +  k)  ■  ■  ■  (p  +  1  )pAp+k  (**) 

+  Y(P  +  *)•••  (P  +  l)p(/,+f+Vp+y+^(^);')- 
;>i 


Put  Cp+i  =  E/=i  Yp+k,i  ■  Summing  the  left  and  right  members  of  (**)  over  / 
gives 

CP+k  =  ap+k(r Ap+k  +  E  (p+fA)Ap+/+,  £(aj8,)'). 

j> 1  7=1 

The  sum  Ej=i(aA)'7  's  an  algebraic  integer  fixed  by  G,  and  it  is  therefore  an 
integer.  Consequently  each  C/)+^  is  an  integer.  Summing  the  left  and  middle 
members  of  (**)  over  k  and  1  gives 

E  =  Y(p  +  k)---(p  +  1  )PCp+k, 

l-l  k> o 

and  this  is  an  integer  divisible  by  p. 

Since  NF( 0)  is  an  integer  not  divisible  by  p,  NF(0)  +  E/=i  F(Pi)  is  an 
integer  not  divisible  by  p,  and  we  have 

\NF(0)+YF(P,)\  >  1. 

i=i 

In  view  of  (*),  we  will  have  a  contradiction  to  R  =  0  if  we  show  that 

|  E  Qifii) I  <  i. 

;=i 

An  easy  argument  by  induction  on  m  shows  that  if  El”=o  ^kZk  =  n./=i  (z  ~cj), 
then  E*=o  \dk\\z\k  <  Ylj=i  (|z|  +  \cj  |).  Applying  this  observation  to  the  sum  and 
product  defining  f(X )  and  using  Lemma  9.52,  we  see  that 

«-«ie<z>i  <  t  wzi*  i 

k= o  ip-  1)! 
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For  each  fixed  z,  the  right  side  is  the  ( p  —  l)st  term  of  the  convergent  series  for  an 
exponential  function  at  an  appropriate  point,  and  hence  the  right  side  is  less  than 
r-tg-|zl  for  p  sufficiently  large,  p  depending  on  z.  Choosing  p  large  enough  to 
make  the  right  side  less  than  r~le~^  for  z  =  (J>  \ , . .  ■ ,  fii  and  summing  over  these 
z’s,  we  obtain  |  Yl'i=i  Q(Pi)  \  <  1>  and  we  have  arrived  at  the  contradiction  we 
anticipated.  □ 


15.  Norm  and  Trace 

This  is  the  second  of  four  sections  in  which  we  combine  Galois  theory  with 
some  of  the  ring  theory  in  the  second  half  of  Chapter  VIII.  We  shall  make  use 
of  a  little  more  linear  algebra  than  we  have  used  thus  far  in  this  chapter,  and  we 
shall  conclude  the  section  by  completing  the  proof  of  Theorem  8.54  concerning 
extensions  of  Dedekind  domains. 

Let  k  be  a  field,  not  necessarily  of  characteristic  0,  and  let  K  be  a  finite 
algebraic  extension.  We  take  advantage  of  the  fact  that  K  is  a  vector  space  over 
k.  If  a  is  in  K,  let  us  write  M  (a)  for  the  k  linear  mapping  from  K  to  K  given  by 
multiplication  by  a.  The  characteristic  polynomial  dct( X 1  —  M(a))  is  called  the 
field  polynomial  of  a  and  is  a  monic  polynomial  in  k[X]  of  degree  [K  :  k].  The 
norm  and  trace  of  a  relative  to  K/k  are  defined  to  be  the  determinant  and  trace 
of  the  linear  mapping  M{a).  In  symbols, 

NK/k(.a)  =  det  (M(a)), 

TrK/k(«)  =  Tr(M(a)). 

Both  Nk/ k  and  Trjj/k  are  functions  from  K  to  k.  If  n  =  [K  :  k],  then  N^/^(a) 
is  (—1)"  times  the  constant  term  of  det(  V /  —  M(a)),  and  Tvy/t-ia)  is  minus  the 
coefficient  of  X"-1 .  The  subscript  K/k  may  be  omitted  when  there  is  no  chance 
of  ambiguity. 

Example,  k  =  Q,  K  =  Q(>/2),  a  =  >/2.  If  we  use  F  =  (1,  \/2)  as  an 
ordered  basis  of  K  over  k,  then  the  matrix  of  M(a)  relative  to  F  is  = 

(iq)-  Since  characteristic  polynomials  are  independent  of  the  choice  of  basis, 
the  field  polynomial  of  a  can  be  computed  in  this  basis  and  is  given  by 

det  (  x/-^(a> )  =  det  (  J  ~~  )  =  X2  -  2. 

We  can  read  off  the  norm  and  trace  as  N(a)  =  —2  and  Tr(«)  =  0. 
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Proposition  9.53.  If  K/k  is  a  finite  extension  of  fields  with  n  =  [K  :  k],  then 
norms  and  traces  relative  to  K/k  have  the  following  properties: 

(a)  N(ab)  =  N(a)N(b ), 

(b)  N(ca)  =  cnN(a)  for  cel, 

(c)  N(l)  =  1,  and  consequently  N(c)  =  cn  for  cel, 

(c)  Tr(a  +  b)=  Tr(a)  +  Tr(  b), 

(d)  Tr(ca)  =  cTr(a)  fore  e  k, 

(e)  Tr(l)  =  n,  and  consequently  Tr(c)  =  nc  for  c  e  k. 

PROOF.  Properties  (a)  and  (b)  follow  from  properties  of  the  determinant  in 
combination  with  the  identities  M{ab)  =  M(a)M(b)  and  M(ca )  =  cM(a). 
Properties  (c)  and  (d)  follow  from  properties  of  the  trace  in  combination  with  the 
identities  M(a  +  b)  =  M{a)  +  M(b)  and  M(ca )  =  cM(a).  Since  M(  1)  is  the 
identity,  the  norm  and  trace  of  1  are  1  and  n,  respectively.  The  other  conclusions 
in  (c)  and  (e)  are  then  consequences  of  this  fact  in  combination  with  (b)  and  (d). 

□ 


Proposition  9.54.  Let  K/k  and  L/K  be  finite  extensions  of  fields  with 
[K  :  k]  =  n  and  [L  :  K]  =  m,  and  let  a  be  in  K.  The  element  a  acts  by 
multiplication  on  K  and  also  on  L,  yielding  k  linear  maps  in  each  case  that  will 
be  denoted  by  M^/kia)  and  Then  in  suitable  ordered  vector-space  bases 

the  matrix  of  M^/'Ayi)  is  block  diagonal,  each  block  being  the  matrix  of  M^/  Aa). 

PROOF.  We  choose  the  bases  as  in  Theorem  7.6.  Thus  let  F  =  {co\ ,  a>2, . . . ) 
be  an  ordered  basis  of  K  over  k,  and  let  A  =  (§i ,  §2 ,  •  •  • )  be  a  basis  of  L  over  K. 
Theorem  7.6  observes  that  the  inn  products  A o>j  form  a  basis  of  L  over  k,  and  we 
make  this  set  into  an  ordered  basis  £2  by  saying  that  (i  1 ,  j\ )  <  (A,  72)  if  i \  <  h 
or  if  i\  =  i2  and  j\  <  j2.  Let  MK/k(a)oJj  =  J2i  Ajcoi ■  Then 

n  m  n 

Mh/k =  (  J2  cijwi)%i  =  E  E  (&kiCij)t;ka>h 

/=!  t=l/=l 


where  8^  is  1  when  k  =  i  and  is  0  otherwise.  The  matrix  (  j  has 

(( k ,  /),  (/',  / ) ) lh  entry  S^cij,  and  this  is  0  unless  the  primary  indices  k  and  i  are 
equal.  Thus  the  matrix  is  block  diagonal,  the  entries  of  the  zth  diagonal  block 
being  c,/.  □ 


Corollary  9.55.  Let  K/k  and  L/K  be  finite  extensions  of  fields  with 
[L  :  K]  =  m,  and  let  a  be  in  K.  Let  and  denote  multiplication 

by  a  on  K  and  on  L,  and  let  Fr/AX)  and  F-.jk  ( X )  be  the  corresponding  field 
polynomials.  Then 

Fh/k(X)  =  (FK/k(X))m- 

Consequently  N^/k(a)  =  (tViK/k («))'"  and  TrL/k(«)  =  m  TrK/k(a). 
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PROOF.  Proposition  9.54  shows  that  the  matrix  of  XI  —  M-^/Xa)  may  be 
taken  to  be  block  diagonal  with  each  of  the  m  diagonal  blocks  equal  to  the 
matrix  of  XI  —  M^/Xa).  The  determinant  of  XI  —  M-./Xa)  is  the  product 
of  the  determinants  of  the  diagonal  blocks,  and  the  formula  relating  the  held 
polynomials  is  proved. 

The  formulas  for  the  norms  and  the  traces  are  consequences  of  this  relationship. 
In  fact,  let 

FK/k(X)  =  X"  +  Cn-iX'1-1  +  ■  ■  ■  +  c0 

and  Fh/k(X)  =  Xmn  +  dmn.xXmn~l  +  ■  ■  ■  +  d0. 

Comparing  coefficients  of  F^^(X)  and  (Fk/k(K))”\  we  see  that  dmn- \  =  mc„- \ 
and  do  =  c'(" .  Therefore 

Nh/k(a)  =  (-l)mnd0  =  ((-l)"c0)m  =  (NK,k(a))m 
and  TrL/k(a)  =  -dmn-\  =  -mcn- \  =  m  TrK/k(a). 

This  completes  the  proof.  □ 

Corollary  9.56.  Let  K/k  be  a  finite  extension  of  fields,  and  let  a  be  in  K.  Then 
the  held  polynomial  of  a  relative  to  K/k  is  a  power  of  the  minimal  polynomial  of 
a  over  k,  the  power  being  [K  :  k(a)].  In  the  special  case  K  =  k(a),  the  minimal 
polynomial  of  a  coincides  with  the  held  polynomial. 

Remarks.  In  the  theory  of  a  single  linear  transformation  as  in  Chapter  V, 
the  minimal  polynomial  of  a  linear  map  divides  the  characteristic  polynomial,  by 
the  Cayley-Hamilton  Theorem  (Theorem  5.9).  For  a  multiplication  operator  in 
the  context  of  helds,  we  get  a  much  more  precise  result— that  the  characteristic 
polynomial  is  a  power  of  the  minimal  polynomial. 

PROOF.  If  F(X)  is  in  k[X],  then  the  operation  M  of  multiplication  has 

M(F(a))b  =  F{a)b  =  F{M(a))b  for  fceK,  (*) 

as  we  see  by  hrst  considering  monomials  and  then  forming  k  linear  combinations. 
The  minimal  polynomial  of  a  over  k  is  the  unique  monic  F(X)  of  lowest  degree 
in  k[X]  for  which  F (a )  =  0,  hence  such  that  M ( F(a))  =  0.  Meanwhile,  the 
minimal  polynomial  of  the  linear  map  M{a )  is  the  unique  monic  F(X)  of  lowest 
degree  such  that  F(M(a ))  =  0.  These  two  polynomials  coincide  because  of  (*)■ 

The  degree  of  the  minimal  polynomial  of  M(a)  thus  equals  the  degree  of  the 
minimal  polynomial  of  a,  which  is  [k(a)  :  k].  The  Cayley-Hamilton  Theorem 
(Theorem  5.9)  shows  that  the  minimal  polynomial  of  M (a)  divides  the  charac¬ 
teristic  polynomial  of  M(a),  i.e.,  the  held  polynomial  of  a.  When  the  held  K  is 
k (a),  the  minimal  polynomial  of  a  and  the  held  polynomial  of  a  have  the  same 
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degree;  since  they  are  monic,  they  are  equal.  This  proves  the  second  conclusion 
of  the  corollary. 

For  the  first  conclusion  we  know  from  Corollary  9.55  that  the  field  polynomial 
of  a  relative  to  a  general  K  is  the  [K  :  k(a)  ]lh  power  of  the  field  polynomial  of  a 
relative  to  k(a).  Since  we  have  just  seen  that  the  latter  polynomial  is  the  minimal 
polynomial  of  a ,  the  first  conclusion  of  the  corollary  follows.  □ 

Example,  continued,  k  =  Q,  K  =  Q(V2),  a  =  sfl.  We  have  seen  that 
the  field  polynomial  of  a  is  X2  —  2,  that  the  norm  and  trace  are  /V (a  )  =  —2  and 
Tr(a)  =  0,  and  that  the  matrix  of  the  multiplication  operator  M{a)  in  the  ordered 
basis  T  =  (1,  y/2)  is  ^  j  The  eigenvalues  of  ^  j  are  ±V2, 

namely  the  roots  of  the  field  polynomial.  These  are  not  in  the  field  k.  Indeed, 
they  could  not  possibly  be  in  the  field,  or  we  would  have  M(a)x  =  Xx  for  some 
x  /  0  in  K  and  some  X  in  k,  and  this  would  mean  that  X  =  a.  Since  the  roots 
±s/ 2  of  the  field  polynomial  each  have  multiplicity  1  and  lie  in  K,  the  matrix 
^  j  is  similar  over  K  to  the  diagonal  matrix  ^  ^  JL  j .  Since  similar  matrices 

have  the  same  trace  and  the  same  norm,  we  can  compute  the  trace  and  norm  of 
M{a)  from  this  diagonal  matrix,  namely  by  adding  or  multiplying  its  diagonal 
entries.  The  significance  of  the  diagonal  entries  is  that  they  are  the  images  of  >/2 
under  the  members  of  the  Galois  group  Gal(K/k).  We  shall  now  generalize  these 
considerations.  Additional  complications  arise  when  K/k  fails  to  be  separable 
and  normal.18 


Proposition  9.57.  Let  k  be  a  field,  let  k(a)  be  an  algebraic  extension  of  k,  and 
suppose  that  the  minimal  polynomial  F(X)  of  a  over  k  is  separable.  Let  K  be  a 
splitting  field  of  F(X),  and  factor  F(X)  over  K  as 

F(X)  =  (X-  ai)(X  -a2)---(X-  an) 

with  all  dj  e  K  and  with  a\  =  a.  Then  the  matrix  of  the  multiplication  operator 
M (fl)k(a)/k  of  a  on  k (a)  is  similar  over  K  to  a  diagonal  matrix  with  diagonal 
entries  a\, ...  ,an.  Consequently 

n  n 

X\?(a)/k(d)  =  |  | dj  and  Tr[,((^/>(^/)  =  ^  ,®j ■ 

7=1  7=1 

18The  above  argument  used  a  matrix  with  entries  in  Ik  and  considered  the  entries  as  in  the  larger 
field  K.  The  reader  may  wonder  what  the  corresponding  construction  is  for  the  k  linear  map  M(a).  It 
is  not  to  treat  M(a)  as  a  K  linear  map  on  K,  since  then  M(a)  would  have  just  the  one  eigenvalue  s/l, 
which  would  have  multiplicity  1.  Instead,  it  is  to  use  tensor  products  as  in  Chapter  VI,  knowledge 
of  which  is  not  being  assumed  at  present.  The  idea  is  to  extend  scalars,  replacing  K  by  IK  0^  IK  and 
replacing  M(a )  by  M(a)  0  1.  The  IK  linearity  occurs  in  the  second  member  of  the  tensor  product, 
not  the  first,  and  the  operator  M(a)  0  1  is  the  K  linear  map  with  eigenvalues  ±  s/l. 
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Remarks.  The  elements  a\,...,an  of  K,  with  a\  =  a.  are  called  the 
conjugates  of  a  over  k.  The  conjugates  of  a  are  the  images  of  a  under  the 
Galois  group  when  k(a)  is  Galois  over  k,  but  they  extend  outside  k  when k(a) /k 
is  not  normal. 

Proof.  Corollary  9.56  shows  that  F{X)  equals  the  field  polynomial  of  a 
relative  to  k(a)/k,  i.e.,  is  the  characteristic  polynomial  of  the  multiplication 
operator  M\,{a)/\,(a).  Let  A  be  the  matrix  of  Mk(a)/k(a)  in  some  ordered  basis  of 
k  (a )  over  k.  If  we  regard  A  as  a  matrix  with  entries  in  K,  then  the  characteristic 
polynomial  of  A  splits  in  K,  and  the  roots  of  the  characteristic  polynomial  have 
multiplicity  1,  by  separability.  Consequently  A  has  a  basis  of  eigenvectors,  the 
eigenvectors  being  column  vectors  with  entries  in  K  and  the  eigenvalues  being  the 
members  cq , . . . ,  an  of  K.  It  follows  that  A  is  similar  over  K  to  a  diagonal  matrix 
with  diagonal  entries  a\, ...  ,an.  The  determinant  and  trace  of  this  diagonal 
matrix  equal  the  determinant  and  trace  of  A,  and  therefore  the  norm  and  trace  of 
a  are  the  product  and  sum  of  the  members  a\, ....  a, ,  of  K.  □ 

Corollary  9.58.  Let  K  be  a  finite  Galois  extension  of  the  field  k,  let  G  = 
Gal(K/k),  let  L  be  an  intermediate  field  with  k  cLc  K,  and  let//  =  Gal(K/L) 
as  a  subgroup  of  G.  Fix  an  ordered  basis  T  ofLoverk.  Then  the  expression  “ct  (a) 
for  a  e  G/H  ”  is  well  defined  for  a  in  L,  and  there  exists  a  nonsingular  matrix 
C  of  size  [L  :  k]  with  entries  in  K  such  that  every  a  in  L  has  C-1  ^  Ml^a)  ^  (j 
diagonal  with  diagonal  entries  a  (a)  for  a  e  G/H.  In  particular,  every  member 
a  of  L  has  norm  and  trace  given  by 

Nh/k(a)  =  ]""[  a  (a)  and  TrL/k(a)  =  ^  a(a). 

creG/H  oaG/H 

PROOF.  LetabeinL,  crbeinG.andr  be  in//.  Then  r(a)  =  a,  and  therefore 
a t (a )  =  <7 (a).  Consequently  all  members  of  the  coset  a H  of  G/H  have  the 
same  value  on  a ,  and  “a  (a)  for  a  e  G/H”  is  well  defined. 

Letn  =  [L  :  k]  =  \G/H\.  Fix  an  ordered  basis  T  ofLoverk.  For  each  a  e  L, 
let  A(a )  be  the  matrix  of  the  multiplication  operator  M(a) L/k  relative  to  T. 

The  Theorem  of  the  Primitive  Element  (Theorem  9.34)  shows  that  L  =  k(x) 
for  some  x.  Proposition  9.57  applies  to  this  element  x  and  to  a  splitting  field 
within  K  for  its  minimal  polynomial,  showing  that  there  is  a  nonsingular  matrix 
C  with  entries  in  K  such  that  C-1  A(x)C  is  a  diagonal  matrix  whose  diagonal 
entries  are  the  n  conjugates  jti, . . . ,  xn  of  x  in  K,  x\  being  x;  the  diagonal  entries 
are  necessarily  distinct  by  separability.  For  each  i  with  1  <  i  <  n,  there  exists  a, 
in  G  with  a,  (x)  =  x,  by  Theorems  9.11  and  9.23.  Since  H  fixes  L,  every  member 
of  the  coset  cr,  H  carries  x  to  x,- .  On  the  other  hand,  every  a  in  G  must  carry  x  to 
some  conjugate,  hence  must  have  <j  (x  )  =  cr,  (x)  for  some  i.  Then  a/  1  r>  fixes  x 
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and  hence  L,  and  it  follows  that  a-  1  a  is  in  H.  Thus  a  is  in  a,H.  In  other  words, 
the  conjugates  X\ , . . . ,  xn  may  be  regarded  exactly  as  the  images  of  the  n  cosets 
ojH. 

In  this  terminology  the  diagonal  entries  of  C~1A(x)C  are  the  n  elements  cj  (x) 
fora  in  G/H.  For  each  j  with  0  <  j  <  n—l,  we  have  A(xJ )  =  A  (x)-7,  and  hence 
C_1A(x-/)C  =  C-1A(x)JC  is  diagonal  with  diagonal  entries  a(x)'  =  a  (xJ )  for 
a  in  G/H.  Forming  k  linear  combinations,  we  see  for  every  polynomial  P(X) 
in  k[X]  of  degree  <  n  —  1  that  C~ 1  A(  P(x))C  is  diagonal  with  diagonal  entries 
cr{P(x)).  Every  element  a  of  K  is  of  the  form  P(x)  for  some  such  P(X),  and 
the  existence  of  C  in  the  statement  of  the  corollary  is  proved.  The  formulas  for 
the  norm  and  trace  follow  by  taking  the  determinant  and  trace.  □ 

Corollary  9.59.  If  IK  is  a  finite  separable  extension  of  the  field  k,  then  the 
trace  function  Trjg/ik  is  not  identically  0. 

Remarks.  This  result  is  trivial  in  characteristic  0  because  Tr^/kll )  =  [K  :  k] 
is  not  zero.  The  result  is  not  so  evident  in  characteristic  p ,  and  the  assump¬ 
tion  of  separability  is  crucial.  An  example  for  which  separability  fails  and 
the  trace  function  is  identically  0  has  k  =  F(x),  where  F  is  a  finite  field  of 
characteristic  p  and  x  is  transcendental,  and  IK  =  k(x1//p).  The  basis  elements 
1,  x1/,p,  x2,//\  . . . ,  x(p~1)lp  all  have  trace  0,  and  therefore  the  trace  is  identically  0. 

PROOF.  By  the  Theorem  of  the  Primitive  Element  (Theorem  9.34),  we  can 
write  IK  =  k(o)  for  some  a  ^  0.  Let  K'  be  a  splitting  field  for  the  minimal 
polynomial  of  a  over  k.  Then  K'/k  is  a  separable  extension  by  Corollary  9.30 
and  hence  is  a  finite  Galois  extension.  Proposition  9.57  shows  that  the  matrix  of 
Mu/k(fl)  in  any  ordered  basis  of  IK  over  k  is  similar  over  IK'  to  a  diagonal  matrix 
with  entries  ay,  where  a\.  ....  a„  are  the  conjugates  of  a  with  ay  =  a. 

These  conjugates  are  necessarily  distinct  by  separability.  For  1  <  k  <  n,  the 
matrix  of  MK/kfo1)  is  similar  via  the  same  matrix  over  IK'  to  a  diagonal  matrix 
with  entries  al\ , .  a*.  If  Tiv/^ia7')  =  0  for  1  <  k  <  n,  then  we  obtain  the 
homogeneous  system  of  linear  equations 


a\X\  +  <72*2  +  •  •  •  +  anxn  =  0, 
a^x i  T  0^X2  +  •  •  •  +  <7“x„  =  0, 


<7 X |  T  u)' X~2  T  *  *  *  T  <7 '' X n  —  0, 

with(xi, . . . ,  xn)  =  (1, . . . ,  1)  as  a  nonzero  solution.  The  coefficient  matrix  must 
therefore  have  determinant  0.  This  coefficient  matrix,  however,  is  a  Vandermonde 
matrix  except  that  the  /lh  column  is  multiplied  by  a,  for  each  j.  Since  ay, . . . ,  an 


15.  Norm  and  Trace 


525 


are  distinct.  Corollary  5.3  shows  that  the  determinant  of  the  coefficient  matrix 
can  be  0  only  if  ci\a.2  ■  ■  •  an  =0.  Since  a  ^  0,  we  have  arrived  at  a  contradiction, 
and  we  conclude  that  TrK/k(oA)  7^  0  for  some  k.  □ 

With  the  aid  of  Corollary  9.59,  we  can  complete  the  proof  of  Theorem  8.54  in 
Section  VIII.  1 1 .  Let  us  restate  the  part  that  still  needs  proof. 

Theorem  8.54.  If  R  is  a  Dedekind  domain  with  field  of  fractions  F  and  if  K 
is  a  finite  separable  extension  field  of  F,  then  the  integral  closure  T  of  R  in  K  is 
finitely  generated  as  an  R  module  and  consequently  is  a  Dedekind  domain. 

Remarks.  What  needs  proof  is  that  T  is  finitely  generated  as  an  R  module. 
It  was  shown  in  Section  VIII.  1 1  how  to  deduce  as  a  consequence  that  T  is  a 
Dedekind  domain. 

PROOF.  Since  R  is  Noetherian  (being  a  Dedekind  domain).  Proposition  8.34 
shows  that  it  is  enough  to  exhibit  T  as  an  R  submodule  of  a  finitely  generated  R 
module  in  K.  Let  {u  i , . . . ,  u„\  be  a  vector-space  basis  of  K  over  F .  Proposition 
8.42  shows  that  we  may  assume  that  each  w,  is  in  T. 

Define  an  F  linear  map  from  K  into  its  F  vector-space  dual  K'  by  y  ly, 
where  ly(x)  =  Tr K/Fi.xy)  for  x  e  K.  This  map  is  one-one  by  Corollary  9.59, 
and  the  equality  of  dimensions  of  K  and  K'  over  F  therefore  implies  that  the 
map  is  onto.  We  can  thus  view  every  member  of  K'  as  uniquely  of  the  form  ly 
for  some  y  in  K.  With  this  understanding,  let  {£„, , . . . ,  £Vn}  be  the  dual  basis  of 
K'  with  £Vj  ( Uj )  =  8jj  for  all  i  and  j.  Then  we  have 

Tr K/F{uiVj)  =  8ij  for  all  i  and  j. 

Applying  Proposition  8.42,  choose  c  0  in  R  with  cVj  in  T  for  all  j .  We  shall 
complete  the  proof  by  showing  that 

T  C  Rc~lu\  +  •  •  •  +  Rc~lun.  (*) 

Before  doing  so,  let  us  observe  that 

Tr k/fU)  is  in  R  if  t  is  in  T.  (**) 

In  fact,  Proposition  9.57  shows  thatTr^ (typ  (f)  is  the  sum  of  all  the  conjugates  of  t, 
whether  or  not  they  are  in  K.  The  conjugates  have  the  same  minimal  polynomial 
over  F  that  t  has,  and  hence  they  are  integral  over  R.  Their  sum  Tr F(t)/F(t)  must 
be  integral  over  R  by  Corollary  8.38,  and  it  must  lie  in  F .  Since  R  is  integrally 
closed  (being  a  Dedekind  domain),  Tr  F(t)/F(t)  lies  in  R.  This  proves  (**). 

Now  we  can  return  to  the  proof  of  (*).  Let  x  be  given  in  T .  Since  T  is  a  ring, 
cxvj  is  in  T  for  each  j,  andTr K/F(cxVj)  is  in  R  by  (**).  Since  {u\, . . . ,  un }  is  a 


526 


IX.  Fields  and  Galois  Theory 


basis,  we  can  write  x  =  djUj  with  each  di  in  F.  Since  Tr(cxi>j)  is  in  R.  the 
computation 

n 

Tr(cxv;)  =  cTrK/F(xVj )  =  c^dt  Tr(n/i>;)  =  cdj 

i= 1 

shows  that  cdj  is  in  R.  Then  the  expansion  x  =  JT  (cdj  )c~ 1 «,  exhibits  x  as  in 
Rc~xit\  +  •  •  •  +  Rc~lun  and  completes  the  proof  of  (*).  □ 


16.  Splitting  of  Prime  Ideals  in  Extensions 

Section  VIII.  7  was  a  section  of  motivation  showing  the  importance  for  number 
theory  and  geometry  of  passing  from  factorization  of  elements  to  factorization 
of  ideals.  The  later  sections  of  Chapter  VIII  set  the  framework  for  this  study, 
examining  the  notions  of  Noetherian  domain,  integral  closure,  and  localization 
and  putting  them  together  in  the  notion  of  Dedekind  domain.  Only  just  now 
were  we  able  to  complete  the  proof  of  the  fundamental  result  (Theorem  8.54)  for 
constructing  Dedekind  domains  out  of  other  Dedekind  domains.  However,  that 
proposition  does  not  complete  the  task  of  extending  what  is  in  Section  VIII. 7  to  a 
wider  context.  Much  of  Section  VIII. 7  concerned  the  relationship  between  prime 
ideals  in  one  domain  and  prime  ideals  in  an  extension.  In  the  present  section  we 
put  that  relationship  in  a  wider  context,  showing  how  the  examples  of  Section 
VIII. 7  are  special  cases  of  the  present  theory. 

In  two  of  the  examples  in  Section  VIII. 7,  we  worked  with  the  ring  Z  of  integers 
inside  its  field  of  fractions  Q  and  with  the  ring  T  of  algebraic  integers  within  a 
quadratic  extension  IK  of  Q.  In  the  third  example  in  that  section,  we  worked 
with  the  ring  C[x],  for  transcendental  x,  inside  its  field  of  fractions  C(x)  and 
with  a  certain  integral  domain  T  within  a  quadratic  extension  of  C(x).  For  all 
three  examples  we  saw  a  correspondence  between  prime  ideals  P  in  T  and  prime 
ideals  (p)  in  Z  or  C[x],  and  that  correspondence  was  formalized  in  a  more  general 
setting  in  Propositions  8.43  and  8.53.  The  objective  now  is  to  understand  that 
correspondence  a  little  better. 

The  notation  for  this  section  is  as  follows:  Let  R  be  a  Dedekind  domain,  such 
as  Z  or  C[x],  and  let  F  be  its  field  of  fractions.19  Let  A"  be  a  finite  separable 
extension  of  F,  and  let  T  be  the  integral  closure  of  R  in  K.  Theorem  8.54, 
including  the  part  just  proved  in  the  previous  section,  shows  that  T  is  a  Dedekind 
domain.  We  make  repeated  use  of  the  fact  about  Dedekind  domains  that  every 
nonzero  prime  ideal  is  maximal. 

19It  might  seem  more  natural  to  assume  that  R  is  a  principal  ideal  domain,  as  it  is  with  Z  and 
C[x].  But  that  extra  assumption  will  not  help  us,  and  it  will  often  not  be  satisfied  when  the  present 
results  are  used  in  the  proof  of  the  important  Theorem  9.64  in  the  next  section. 
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Proposition  8.43  shows  that  if  P  is  any  nonzero  prime  ideal  of  7  ,  then  p  =  RDP 
is  a  nonzero  prime  ideal  of  R.  In  the  reverse  direction  Proposition  8.53  shows  that 
if  p  is  any  nonzero  prime  ideal  in  7?,  then  pT  T ,  and  there  exists  at  least  one 
prime  ideal  P  of  T  with  p  =  RDP.  The  unique  factorization  of  ideals  in  T  (given 
as  Theorem  8.55)  explains  this  correspondence  better.  If  p  is  given,  then  pT  is  a 
proper  ideal,  hence  is  contained  in  some  maximal  ideal  P.  Since  ‘‘to  contain  is 
to  divide”  (by  Theorem  8.55d),  such  P’s  (and  only  such  P’s)  are  factors  in  the 
decomposition  of  pT  as  the  product  of  nonzero  prime  ideals.  Accordingly  let  us 
write 

p7  =  fl  p;\ 

i=l 

where  the  P,  are  the  distinct  prime  ideals  of  T  containing  p  7",  or  equivalently  the 
distinct  prime  ideals  of  T  satisfying  R  fl  P,  =  p.  The  e,  are  positive  integers 

called  the  ramification  indices. 

For  each  P, ,  we  can  form  the  composition  R  C  T  — »■  T /Pi  of  inclusion 
followed  by  passage  to  the  quotient.  Since  p  C  P, ,  this  composition  descends  to 
a  ring  homomorphism  R/p  — >■  T /  P,.  The  ideal  p  is  maximal  in  R ,  and  the  ideal 
Pi  is  maximal  in  T .  Thus  the  mapping  R/p  — >  T / P,  is  in  fact  a  field  map.  We 
regard  it  as  an  inclusion.  Define 

fi  =  \ T/Pi  :  R/Pl 

allowing  the  dimension  for  the  moment  possibly  to  be  +oo.  It  will  follow  from 
Theorem  9.60,  however,  that  fi  is  finite.  The  integer  f  is  called  the  residue  class 
degree. 

Theorem  9.60.  Let  R  be  a  Dedekind  domain,  let  F  be  its  field  of  fractions,  let 
K  be,  a  finite  separable  extension  of  F  with  [  K  :  F]  =  n,  and  let  T  be  the  integral 
closure  of  R  in  K.  If  p  is  a  nonzero  prime  ideal  in  R  and  pT  =  nf=i  K  ^  a 
decomposition  of  pT  as  the  product  of  powers  of  distinct  nonzero  prime  ideals  in 
T ,  then  the  ramification  indices  e,  and  residue  class  degrees  /,  =  [T /Pi  :  R/p\ 
are  related  by 

g 

^leifi  =  «• 

i=l 

Remarks.  Consequently  each  f  is  finite.  The  cases  of  interest  for  our  earlier 
examples  have  R  =  Z  or  R  =  C[x].  When  R  =  Z,  each  R/p  is  a  finite  field. 
However,  when  R  =  K[x]  for  some  field  K  of  characteristic  0  like  K  =  C,  then 
each  R/p  is  a  finite  extension  of  K,  hence  is  an  infinite  held.20 

20When  R  =  C[x],  then  T /Pi  =R/p  =  C  since  €  is  algebraically  closed.  The  last  example  of 
the  present  section  will  elaborate. 
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PROOF.  Corollary  8.63  gives  a  ring  isomorphism 

T/(pT)  =  T/P{'  x  •••  x  T/Pegs.  (*) 

Recall  from  the  definition  of  residue  class  degree  that  we  have  a  field  mapping  of 
R/ p  into  each  T/Pj.  Since  p  C  Pf  for  1  <  e  <  <?,■  and  since  p  C  p7\  it  follows 
similarly  that  we  have  a  one-one  ring  homomorphism  of  R/p  into  each  T /  Pf 
with  1  <  e  <  et  and  another  one-one  ring  homomorphism  of  R/p  into  T /(p T). 
Consequently  each  T / Pf  with  1  <  e  <  e,,  the  product  T j P'f  x  •  •  •  x  T /Pa" , 
and  T/(pT)  may  all  be  regarded  as  unital  R/p  modules,  i.e.,  as  vector  spaces 
over  the  field  R/p.  Fix  i.  For  1  <  e  <  e(  ,  let  us  prove  by  induction  on  e  that 

dim  Rlp(T/P[)  =  efi,  (**) 

the  case  e  =  1  being  the  base  case  of  the  induction.  Assume  inductively  that  (**) 
holds  for  exponents  from  I  to  e  —  1 .  We  know  from  Corollary  8.60  that  Pf~ 1  /  Pf 
is  a  vector  space  over  the  field  T /  P,  with 

dimT,Pi(Pr1/Ple)=  1-  (t) 

The  First  Isomorphism  Theorem  (as  in  the  remark  with  Theorem  8.3)  gives 
T  /  Pf~X  =  (T  /  Pf)  /  (Pf~l  /  Pf)  as  vector  spaces  over  R/p,  and  it  follows  that 

dim^/p(T/P;)  =  dimtf/p  (T/Pf~l)  +  dim  Rlv(Pf~l/Pf) 

=  (e-  1  )fi  +  fi  =  efi, 

the  next-to-last  equality  following  from  (f)  and  the  inductive  hypothesis  for  the 
cases  e  —  1  and  1 .  This  completes  the  induction  and  the  proof  of  (**)• 

In  view  of  the  decomposition  (*)  and  the  formula  (**)  when  e  =  c,-,  the 
theorem  will  follow  if  it  is  shown  that 

dim*/p(77(pr))  =  n.  (tt) 

To  prove  (tf)  we  localize.  Let  S  be  the  complement  of  the  prime  ideal  p  of  R. 
Corollary  8.48  shows  that  ,S'~ 1  R  is  a  Dedekind  domain.  Corollary  8.50  shows 
that  1  p  is  its  unique  maximal  ideal,  and  Corollary  8.62  shows  that  S~'  R  is  a 
principal  ideal  domain. 

The  composition  R  c  S~l R  — >  S~1R/S~i p  descends  to  a  field  mapping 
R/p  — >  S~lR/S~lp.  Let  us  see  that  this  mapping  is  onto.  If  s/f 1  /'q  +  5-1p  in 
S~l  R/S~l p  is  given,  then  .so  is  not  in  p,  and  the  maximality  of  p  as  an  ideal  in 
R  implies  that  (.so)  +  p  =  R.  Therefore  we  can  choose  r  in  R  and  .v  in  p  with 
rso  +  x  =  /•().  Under  the  mapping  R/p  -»■  S~l R/S~] p,  the  image  of  r  +  p  is 
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r  +  S  1p=r  +  J01x  +  5  'p  =  s0  '(rso  +  x)  +  S  'p  =  s0  *r0  +  5  1  p.  Thus 
our  mapping  is  onto  S~  1  R/S~ 1  p.  and  we  have  an  isomorphism  of  fields 

R/p  =  S~lR/S~lp.  (t) 

Similarly  the  composition  T  C  S~lT  — »■  S~{T /(S~1pT)  descends  to  a  ho¬ 
momorphism  of  rings  T /p T  — >  5_1r/(5_1pr).  Let  us  show  that  this  map  too 
is  one-one  onto. 

If  t  +  pT  is  in  the  kernel,  then  the  member  t  of  T  is  in  S~lpT,  and  it  is  in 
p7  for  some  s  in  S.  Hence  we  have  (s)(t)  C  /J[  •  •  •  Pgs ,  and  we  can  write 
(i)(t)  =  P^'  •  •  ■  Pgs  Q  for  some  ideal  Q.  Factoring  the  principal  ideals  (s)  and 
(f)  and  using  the  uniqueness  of  factorization  of  ideals  gives 

(«)  =  <'•••  Pgs  Q i  and  (f)  =  P •  •  •  Pg8  Q2 

with  Q  =  Q  i  Q2  and  with  uj  +  if  =  (f  for  all  j.  If  uj  >  0,  then  we  must  have 
(s)  C  Pj  and  sR  C  Pj  n  R  =  p.  This  says  that  s  is  in  p,  in  contradiction  to 
the  fact  that  S  equals  the  set-theoretic  complement  of  p  in  R.  We  conclude  that 
uj  =  0  for  all  j.  Therefore  (t)  =  P*1  ■  ■  ■  Pgs  Q2  C  P ^  •  •  •  Pgs  =  p T,  and  t  is  in 
pT.  Consequently  the  kernel  consists  of  the  0  coset  alone. 

Let  us  show  that  T/pT  maps  onto  S~lT /(S~lpT).  If  s(y 1  ?0  +  S~'pT  in 
S~lT/S~lpT  is  given,  then  .so  is  not  in  p,  and  the  maximality  of  p  as  an  ideal 
in  R  implies  that  (.so)  +  p  =  R.  Therefore  we  can  choose  r  in  R  and  x  in  p 
with  rso  +  x  =  1,  hence  with  /'.so to  +  xto  =  to-  Under  the  mapping  T /pT 
5_1  T /(S~lpT),  the  image  of  rto  +  pT  is 

rto  +  S-'pT  =rt0  +  SQlxt0  +  S~lpT 

=  SQl(rs0t0  +  xt0)  +  S~{pT 
=  SQlt0  +  S~lpT. 

Thus  our  mapping  is  onto  .S'~ 1 T /  S~ 1  pT,  and  we  conclude  that  we  have  an 
isomorphism  of  rings 

T/pT  ^  S~lT/(S~lpT).  (tt) 

Since  T  is  finitely  generated  as  an  R  module  (Theorem  8.54),  S~lT  is  finitely 
generated  as  an  5“ 1 R  module  with  the  same  generators.  Since  S~ 1 R  is  a  principal 
ideal  domain.  Theorem  8.25c  shows  that  S~lT  is  the  direct  sum  of  cyclic  S~1  R 
modules.  Each  of  these  cyclic  modules  must  in  fact  be  isomorphic  to  S~'  R  since 
S-1  T  has  no  zero  divisors,  and  therefore  S~ 1 T  is  a  free  S~l  R  module  of  some 
finite  rank  m.  If  t\, . . . ,  tm  are  free  generators,  then  we  have 

S  i  T  =  S  ^  Rt\  S  i  Rtm. 


(§) 
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Let  us  see  that  {/j ,  . . . ,  tm }  is  an  F  vector-space  basis  of  K.  Suppose  Ey  Cjtj  =  0 
with  all  cj  in  F.  Proposition  8.42  shows  that  there  is  an  r  ^  0  in  R  with 
rci, . . . ,  rcm  in  R.  Then  E;  (rcj)tj  =  0,  and  the  independence  of  t\, . . . ,  tm 
over  S~x  R  implies  that  rcj  =0  for  all  j.  Thus  cj  =  0  for  all  j,  and  we  obtain 
linear  independence  over  F .  If  x  e  K  is  given,  we  can  choose  r  /  0  in  R  with 
rx  in  T  by  Proposition  8.42.  Since  t\, ...  ,tm  span  S~l  T  over  S~lR,  we  can  find 
members  d\, . . . ,  dm  of  S~]  R  with  rx  =  Ey  djtj-  Then  x  =  E;  r~Xdjtj  with 
each  coefficient  r~ldj  in  F.  This  proves  the  spanning.  Hence  (f  i , . . . ,  tm }  is  an 
F  vector-space  basis,  and  m  =  n. 

To  complete  the  proof  of  (ft)  and  hence  the  theorem,  it  is  enough,  in  view  of  the 
isomorphisms  ($)  and  ($$),  to  prove  that  the  cosets  tj  +  S~lpT  in  S_1  T /(S~lpT) 
form  a  vector-space  basis  over  S~'  R/S~l p.  If  t  is  in  S~lT ,  then  (§)  says  that 
f  =  E  cjtj  with  Cj  in  S-1R.  Hence 

t  +  S~lpT  =  E  (Cj  +  S  1  p) (tj  +  S~lpT), 

and  we  have  spanning.  If  Ey  (cj  +  S~ip)(tj  +  S~1pT)  =  0+S~lpT,  then  Ey  Cjtj 
is  in  S~'pT.  Thus  we  can  write  E,  ci (i  =  E;  ad'i  with  a,  e  p  and  t\  e  S-1  T. 
Expanding  each  tj  according  to  (§),  substituting,  and  using  the  uniqueness  of  the 
expansion  (§),  we  see  for  each  j  that  Cj  is  a  sum  of  products  of  the  a,-  ’s  by  members 
of  5-1  R.  Therefore  each  Cj  is  in  S~ 1  p.  This  proves  the  linear  independence  and 
establishes  (ff).  □ 

The  case  of  greatest  interest  is  that  A'  is  a  finite  Galois  extension  of  F.  In  this 
case  the  statement  of  Theorem  9.60  simplifies  and  will  be  given  in  its  simplified 
form  as  Theorem  9.62.  We  begin  with  a  lemma. 

Lemma  9.61.  Let  A  be  a  Dedekind  domain,  let  F  be  its  field  of  fractions, 
let  A"  be  a  finite  separable  extension  of  F,  and  let  T  be  the  integral  closure  of  R 
in  K.  Suppose  that  K  is  Galois  over  F.  If  p  is  a  nonzero  prime  ideal  in  R  and 
=  nf=i  p?  is  a  decomposition  of  pT  as  the  product  of  nonzero  prime  ideals 
in  T,  then  Gal (K /  F)  is  transitive  on  the  set  of  ideals  [P\ , . . . ,  /J„ } . 

PROOF.  Arguing  by  contradiction,  suppose  that  Pj  is  not  of  the  form  cr( P\) 
for  some  a  in  Gal(A'/F).  By  the  Chinese  Remainder  Theorem  we  can  choose 
an  element  t  of  T  with  t  =  0  mod  Pj  and  t  =  1  mod  n(  P\ )  for  all  a .  Every  a 
in  GalfA" / F)  carries  t  to  a  member  of  T  since  t  and  a(t)  have  the  same  minimal 
polynomial  over  F.  Corollary  9.58  shows  that  Nx/i-it)  =  Wa&GMK  /  n  an(J 

consequently  Nk/fU)  is  in  T  fl  F  =  R.  Since  the  factor  t  itself  is  in  Pj,  Nk/fU) 
is  in  Pj.  Therefore  Nk/f (0  is  in  R  Cl  Pj  =  p  C  ]~[?=1  Pp .  The  right  side  is 
contained  in  P\.  Since  P\  is  prime,  some  factor  07(f)  of  N u/ pit)  is  in  P\.  Then 
t  is  in  07“ 1  (Pj ),  in  contradiction  to  the  fact  that  t  =  1  mod  cr  (Pi  )  for  all  a .  □ 
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Theorem  9.62.  Let  R  be  a  Dedekind  domain,  let  F  be  its  field  of  fractions, 
let  K  be  a  finite  separable  extension  of  F  with  \  K  :  F\  =  n,  and  let  T  be  the 
integral  closure  of  R  in  K.  Suppose  that  K  is  Galois  over  F.  If  p  is  a  nonzero 
prime  ideal  in  R  and  fpT  =  [7f=i  Pp  is  a  decomposition  of  pT  as  the  product 
of  powers  of  distinct  nonzero  prime  ideals  in  T,  then  the  ramification  indices 
have  e\  =  •  •  •  =  eg,  and  the  residue  class  degrees  f  =  [ T / Pt  :  R /p ]  have 
/i  =  •  •  •  =  fg.  If  e  and  /  denote  the  common  value  of  the  <?,  ’s  and  of  the  ff  s, 
then 

efg  =  n . 


Proof. 

obtaining 


For  a  in  Gal(W/F),  apply  a  to  the  factorization  p T 

pT  =  a(Plr  f\(T(Pi)ei. 

(=2 


nf=i  p? 


Lemma  9.61  shows  that  <7 (Pi)  can  be  any  Pj,  and  unique  factorization  of  ideals 
(Theorem  8.55)  therefore  implies  that  e\  =  e;-.  With  the  same  a,  the  fact  that  a 
respects  the  field  operations  implies  that 


T/Pi  =a(T)/a(P1)  =  T/Pj, 

and  thus  f\  =  f/ .  Substituting  the  values  of  the  e,  ’s  and  the  fj ’s  into  the  formula 
of  Theorem  9.60,  we  obtain  efg  =  n.  □ 


Examples  with  n  =  2  continued  from  Section  VIII.7. 

(1)  R  =  Z  and  T  =  7L\\[— T ].  In  this  case,  Z  and  T  are  both  principal  ideal 
domains.  We  found  three  possible  behaviors21  for  the  prime  factorization  of  a 
principal  ideal  (p)T  in  T  generated  by  a  prime  p  >  0  in  Z: 

(a)  (p)T  is  prime  in  T  if  p  =  4 m  +  3.  Here  e  =  g  =  1;  so  /  =  2. 

(b)  (p)T  =  (a  +  ib)(a  —  ib)  with p  =  a2  +  b2  if  p  =  4m  +  1.  Heree  =  1 

and  g  =  2;  so  /  =  1 . 

(c)  (2 )T  =  (1  +  i)2.  Here  e  =  2  and  g  =  1;  so  /  =  1. 

(2)  R  =  Z  and  T  =  Z[-/^5].  In  this  case,  T  is  not  a  unique  factorization 
domain  and  is  in  particular  not  a  principal  ideal  domain.  We  gave  examples  of 
three  possible  behaviors  for  the  prime  factorization  of  a  principal  ideal  {p)T  in 
T  generated  by  a  prime  p  >  0  in  Z: 

(a)  (11)7’  is  prime  in  T.  Here  e  =  g  =  1;  so  /  =  2. 

(b)  (2)7’  =  (2,  1  +  v7— 5)(2,  1  -V=5).  Heree  =  1  and  g  =  2;  so  /  =  1. 

(c)  (5)7’  =  (V— ■ 5  )2.  Here  e  =  2  and  g  =  1;  so  /  =  1. 

21The  notation  here  fits  with  the  notation  in  Theorem  9.62  and  is  different  from  the  notation  in 
Section  VIII.7. 
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(3)  R  =  C[x]  and  T  =  C[x,  y/(x  —  l)x(x  +  1)  ].  In  this  case,  R  is  a  principal 
ideal  domain,  and  we  saw  that  T  is  not  a  unique  factorization  domain.  We  found 
two  possible  behaviors  for  the  prime  factorization  of  a  principal  ideal  (p)  T  in  7 
generated  by  a  prime  p  in  C[x]: 

(a)  (x  —  xo )T  =  (x  —  xo,  y  —  >’o)(x  —  xo,  y  +  >’o)  if  the  equal  expressions 
Jo  =  C*o  —  I  )X()(xo  +  1)  are  not  0.  Here  e  =  1  and  g  =  2;  so  /  =  1. 

(b)  (x  —  xo )T  =  (x  —  xo,  y)2  if  xo  is  in  {—1,  0,  +1}.  Here  e  =  2  and 
g  =  1;  so  /  =  1. 

The  third  type,  with  (x  —  Xq)T  prime  in  T .  does  not  arise.  It  cannot  arise  since 
/  >  1  would  point  to  a  quadratic  extension  of  C,  yet  C  is  algebraically  closed. 


17.  Two  Tools  for  Computing  Galois  Groups 

In  Section  8  we  mentioned  that  the  effect  of  the  Fundamental  Theorem  of  Galois 
Theory  is  to  reduce  the  extremely  difficult  problem  of  finding  intermediate  fields 
to  the  less-difficult  problem  of  finding  a  Galois  group.  In  the  intervening  sections 
we  have  seen  some  illustrations  of  the  power  of  this  reduction,  all  in  cases  in 
which  the  Galois  group  was  close  at  hand. 

The  problem  of  finding  a  Galois  group  in  a  particular  situation  is  usually  not 
as  easy  as  in  those  cases,  and  it  by  no  means  can  be  considered  as  solved  in 
general.  In  this  section  we  combine  Galois  theory  with  some  of  the  ring  theory 
in  the  second  half  of  Chapter  VIII  in  order  to  develop  two  tools  that  sometimes 
help  identify  particular  Galois  groups. 

Let  us  think  in  terms  of  a  finite  Galois  extension  K  of  the  rationals  Q.  The 
field  K  is  the  splitting  field  of  some  irreducible  monic  polynomial  with  rational 
coefficients,  and  we  can  scale  this  polynomial's  indeterminate  (in  effect  by  multi¬ 
plying  its  roots  by  some  nonzero  integer)  so  that  the  polynomial  is  monic  and  has 
integer  coefficients.  Thus  let  F(X )  be  a  monic  irreducible  polynomial  in  Z[X]  of 
some  degree  d ,  and  let  K  be  its  splitting  field  over  Q.  The  members  of  Gal(Af  /Q) 
are  determined  by  their  effect  on  the  d  roots  of  F(X),  and  hence  Gal  ( K /Q)  may 
be  regarded  as  a  subgroup  of  the  symmetric  group  ©</.  I  f  r  | . . . . ,  />/  are  the  roots 
of  F(X),  then  the  discriminant  of  F(X)  is  the  member  of  K  defined  by 

D=  n  (Tj  -  r,)2. 

]</<  j<d 

This  was  defined  in  Section  1 3  in  the  cases  d  =  2  and  d  =  3 ,  and  we  computed  the 
value  of  D  in  those  cases.  The  discriminant  is  an  integer  under  our  hypotheses, 
and  it  is  computable  even  though  the  roots  r\ , . . . ,  of  F(X)  are  not  at  hand.  In 
fact,  the  proof  of  Theorem  9.50  indicates  that  the  discriminant  D  is  given  by  the 
determinant 
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/  d 

fll 

Cl2 

0,7- 1 

a\ 

02 

a3 

ad 

D  =  det 

02 

03 

Orf+ 1 

\CLd-  1 

ad 

Ud+\ 

02(7-2 

where  aj  =  r (  +  rj  +  •  •  •  +  rj .  Problems  36-39  at  the  end  of  Chapter  VIII  show 
that  each  of  a\, ,  a  2d- 1  can  be  expressed  as  a  polynomial  in  the  elementary 
symmetric  polynomials  in  r\, . . . ,  rd ,  i.e.,  in  the  coefficients  of  F(X),  and  doing 
so  in  a  symbolic  manipulation  program  is  manageable  for  any  fixed  degree.22 

The  first  of  the  two  tools  that  sometimes  help  in  identifying  particular  Galois 
groups  directly  concerns  the  discriminant:  the  discriminant  is  a  square  if  and  only 
if  the  Galois  group  is  a  subgroup  of  the  alternating  group.  Let  us  state  the  result 
in  the  context  of  a  general  finite  Galois  extension  even  though  we  shall  use  it  only 
for  our  Galois  extension  K /Q. 

Proposition  9.63.  Let  K/k  be  a  finite  Galois  extension,  and  suppose  that  K 
is  the  splitting  field  of  a  separable  polynomial  F(X)  in  k[  X  |  of  degree  d.  Let 
D  be  the  discriminant  of  F{X).  and  regard  G  =  Gal  (K/k)  as  a  subgroup  of  the 
symmetric  group  &d.  Then  I)  is  in  k,  and  G  is  a  subgroup  of  the  alternating 
group  2l<y  if  and  only  if  D  is  the  square  of  an  element  of  k. 

Remark.  The  proof  will  use  Galois  theory  to  show  that  D  is  in  k,  and  Problems 
36-39  at  the  end  of  Chapter  VIII  do  not  need  to  be  invoked. 

Proof.  Let  /- 1 , . . . ,  rd  be  the  roots  of  F(X),  and  put  A  =  n,<,  (rj  ~  ri ) ■ 
Under  the  identification  of  G  with  a  subgroup  of  the  permutation  group  6,/  on 
{1,  . . . ,  d],  each  o  in  G  has 

ct(A)=  f[  =  [7  ()V0rr5(i))  =(sgncr)  ["[  (rj-r-)  =(sgncr)A. 

i<j  ‘<j  i  <j 


22 For  example,  when  d  =  3,  let  F(X)  =  X3  —  ciX2  +  ciX  —  C3.  In  Mathematica  the  following 
program  produces  a1.a2.a3,  04  as  output: 

el={al==rl+r2+r3 ,  rl+r2+r3==cl ,  rl  r2+r2  r3+rl  r3==c2, 
rl  r2  r3==c3) 

Eliminate [el,{rl,r2,r3j] 

e2={a2==rlA2+r2A2+r3A2 ,  rl+r2+r3==cl ,  rl  r2+r2  r3+rl  r3==c2, 
rl  r2  r3==c3) 

Eliminate [e2,jrl,r2,r3j] 

e3={a3==rlA3+r2A3+r3A3 ,  rl+r2+r3==cl ,  rl  r2+r2  r3+rl  r3==c2, 
rl  r2  r3==c3) 

Eliminate [e3,jrl,r2,r3j] 

e4={a4==rlA4+r2A4+r3A4 ,  rl+r2+r3==cl ,  rl  r2+r2  r3+rl  r3==c2, 
rl  r2  r3==c3) 

Eliminate [e4,{rl,r2,r3j] 
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In  particular,  the  element  D  =  A2  has  o(D )  =  D.  By  Proposition  9.35d,  D  is 
in  k. 

If  some  a  in  G  has  sgna  =  —  1,  then  a  does  not  fix  A,  and  A  is  not  in  k. 
Since  A  is  a  square  root  of  D  and  since  any  two  square  roots  of  an  element  in  a 
field  differ  at  most  by  a  sign,  D  is  not  the  square  of  any  element  of  k. 

Conversely  if  every  a  in  G  has  sgna  =  +1,  then  every  a  fixes  A,  and 
Proposition  9.35d  shows  that  A  is  in  k.  Since  D  =  A2,  D  is  the  square  of  the 
member  A  of  k.  □ 


The  second  tool  is  complicated  to  prove  but  simple  to  state.  We  reduce  the 
polynomial  F(X)  modulo  p  for  each  prime  number  p  and  form  the  associated 
finite  splitting  field.  The  Galois  group  for  a  finite  extension  of  finite  fields  is 
cyclic  by  Proposition  9.40,  and  we  thus  obtain  a  cyclic  subgroup  of  &,i-  The 
second  tool  is  this:  if  p  does  not  divide  the  discriminant  of  F(X).  then  this  cyclic 
group  as  a  permutation  group  is  a  subgroup  of  Gal  ( K /Q)  as  a  permutation  group, 
up  to  a  relabeling  of  the  symbols.  In  other  words,  the  order  and  cycle  structure 
of  a  generator  of  the  cyclic  group  are  the  same  as  the  order  and  cycle  structure  of 
some  element  of  Gal(A' /Q). 

Let  us  formulate  the  result  precisely.  In  the  setting  of  Theorem  9.62,  fix  a  prime 
ideal  P  of  T  lying  in  the  factorization  of  pT.  Each  member  cr  of  G  =  Gal  (A- /  F) 
carries  T  to  itself,  but  not  every  a  in  G  carries  P  to  itself.  Let  Gp  be  the  isotropy 
subgroup  of  G  at  P ,  i.e.,  let  Gp  =  {o  e  G  \  cr(P)  =  P}.  The  subgroup 
Gp  is  called  the  decomposition  group  at  P.  Each  a  in  Gp  descends  to  an 
automorphism  of  the  field  T /  P  that  fixes  the  subfield  R/p,  since  a  fixes  each 
element  of  R.  Thus  a  defines  a  member  cr  of  G  =  Gal((Y / P)/(R/p))  by  the 
formula 

ct(x)  =  a (x),  where  y  =  y  +  P  for  y  e  T. 

It  is  apparent  that  a  m*-  a  is  a  homomorphism  of  G  into  G.  This  homomorphism 
turns  out  to  yield  the  result  stated  informally  in  the  previous  paragraph.  It  has  the 
key  property  given  in  Theorem  9.64. 


Theorem  9.64.  Let  R  be  a  Dedekind  domain,  let  F  be  its  field  of  fractions,  let 
K  be  a  finite  separable  extension  of  F  with  [K  :  F]  =  n,  and  let  T  be  the  integral 
closure  of  R  in  K .  Suppose  that  K  is  Galois  over  F .  Let  p  be  a  nonzero  prime  ideal 
in  R.  let  P  =  P\  be  a  prime  factor  in  a  decomposition  pT  =  nf=i  °f  pT  as 
the  product  of  powers  of  distinct  nonzero  prime  ideals  in  T ,  and  suppose  that  T/P 
is  a  Galois  extension  of  R/p.  Let  G  =  Gal (K/F),  Gp  =  {a  e  G  \  cr(P)  =  P}, 
and  G  =  GalKT / P ) /( R /p)).  Then  the  group  homomorphism  cr  t->  cr  of  G p 
into  G  carries  Gp  onto  G. 
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Remarks.  In  our  application  with  R  =  Z,  T / P  and  R /p  are  finite  fields,  and 
Proposition  9.40  shows  that  T/P  is  a  Galois  extension  of  R/p  with  no  further 
assumptions. 

PROOF.  Let  Kd  be  the  fixed  field  of  G p  within  K  \  Theorem  9.38  shows  that 
Gal(A"  /  Kd)  =  Gp.  Let  Td  be  the  integral  closure  of  R  in  Kd;  this  is  a  Dedekind 
domain,  and  T  is  the  integral  closure  of  Td  in  K.  We  are  going  to  apply  Theorem 
9.62  with  R  in  the  theorem  replaced23  by  Td . 

Proposition  8.43  shows  that  V  =  Td  DP  is  a  nonzero  prime  ideal  of  Td .  Since 
every  member  of  Gf  carries  P  to  itself  and  since  Gp  is  the  full  Galois  group 
of  K  over  Kd,  Lemma  9.61  shows  that  P  is  the  only  nonzero  prime  ideal  of  T 
whose  intersection  with  Td  is  V.  Therefore  VTd  =  Pe  for  some  integer  e  >  I . 

As  always,  we  have  a  field  mapping  R/p  — >■  Td /V.  Let  us  show  that  this 
mapping  is  onto  Td /V.  For  any  given  u  in  Td ,  we  are  to  produce  r  in  R  with 

r  =  u  mod  V.  (*) 

Each  a  in  G  that  is  not  in  Gp  has  a~x  P  P.  and  the  previous  paragraph  shows 
that  the  nonzero  prime  ideal  Va  =  Her- 1  P  of  Td  has  V„  7^  TdDP.  Therefore 
Va  +  V  =  Td .  and  the  Chinese  Remainder  Theorem  (Theorem  8.27)  shows  that 
we  can  find  an  element  v  of  Td  with 

v  =  u  mod  V  and  v  =  I  mod  Va 

for  all  a  that  lie  in  G  but  not  Gp.  The  first  congruence  implies  that  v  —  u  is  in 
V  =  Td  n  P  C  P ,  hence  that 


v  =  u  mod  P,  (**) 

while  the  second  congruence  implies  that  v  —  1  is  in  Va  =  Td  Da  ~x  P  C  a-1  P, 
hence  that  a  (u  —  1)  lies  in  P.  Therefore 

a(v)  =  1  mod  P  for  all  a  in  G  but  not  Gp.  (t) 

Put  r  =  Nk,i /F(v).  Since  the  splitting  field  of  the  minimal  polynomial  of  v  over 
F  is  contained  in  K ,  Corollary  9.58  shows  that  r  is  the  product  of  the  elements 
<y  (w)  for  a  in  G/Gp.  Each  of  these  is  in  T ,  and  hence  Np,i  ip(v)  is  in  T .  Since 
NKd/F(v)  is  also  in  F,  r  =  N^d/p^v)  is  in  T  fl  F  =  R.  If  we  use  a  =  1  as  the 
representative  of  the  identity  coset  of  G/Gp,  then  we  have 

r  =  NKd/F(v)  =  v  (  [1  CT(L>)- 

some  0 ’s 
not  in  G  p 


23 Consequently  it  would  not  have  been  sufficient  to  prove  Theorem  9.62  when  the  ring  R  is  a 
principal  ideal  domain. 
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The  factor  of  v  is  congruent  to  u  mod  P  by  (**),  and  each  factor  in  parentheses 
is  congruent  to  1  mod  P  by  (f).  Therefore  r  =  u  mod  P,  and  r  —  u  is  in  P. 
Since  r  —  u  is  in  Td ,  r  —  u  is  in  Td  fl  P  =  V.  This  proves  (*).  Consequently  we 
can  identify  G  =  Gal( (T/P)/(R/p))  with  Gal ((T / P)/(Td /V)). 

Choose  jci  in  T/P  with  T/P  =  (Td/V)[x i];  this  choice  is  possible  by  the 
assumed  separability  of  (T/P)/(R/p).  Let  x  |  be  a  member  of  T  with  x\  =  X\ +P, 
and  let  M(X )  be  the  minimal  polynomial  of  X\  over  Kd .  Since  X\  is  in  T,  the 
coefficients  of  M(X )  are  in  Td .  Let  M(X)  be  the  corresponding  member  of 
(Td /'P)\X\.  given  by  the  substitution  homomorphism  that  takes  Td  to  Td /V  and 
takes  X  to  X.  Since  K / Kd  is  normal,  M(X)  splits  over  K.  Write  x\, . . . ,  xn  for 
its  roots;  these  are  in  T. 

Let  r  be  given  in  G,  and  suppose  that  r(xi)  =  xj.  Since  M(X)  is  irreducible 
over  Kd ,  the  Galois  group  Gal ( K / Kd)  =  Gp  is  transitive  on  its  roots.  Choose  a 
in  G/>  with  cr(xi)  =  Xj.  Then  ct(xi)  =  Xj .  Since  a  and  r  agree  on  the  generator 
x\  of  T/P  over  Td /V,  they  agree  on  T/P.  Therefore  r  is  exhibited  as  the  image 
of  a  under  the  homomorphism  of  the  theorem,  and  the  proof  is  complete.  □ 

A  first  consequence  of  Theorem  9.64  is  that  we  get  interpretations  of  the 
integers  e,  /,  and  g,  and  they  will  be  helpful  to  us.  Galois  theory  gives  us 
|  G |  =  n,  and  Theorem  9.62  says  that  efg  =  n.  The  transitivity  in  Lemma  9.61 
says  that  G  acts  transitively  on  the  set  {Pi, . . . ,  Pg},  and  the  isotropy  subgroup  at 
P  =  P\  is  Gp.  Hence  g\Gp\  =  |G|,  and  \Gp\  =  n/ g  =  ef .  Galois  theory  gives 
us  [  G  |  =  /,  and  the  fact  that  G p  maps  onto  G  says  that  G p  /kernel  =  G ;  therefore 
|kernel|  =  \Gp\/\G\  =  ( ef )  /  /'  =  e.  We  conclude  that  g  is  the  number  of  cosets 
modulo  Gp,  e  is  the  order  of  the  kernel  of  the  homomorphism  in  Theorem  9.64, 
and  /  is  the  order  of  the  cyclic  group  G. 

In  the  setting  of  interest  for  current  purposes,  we  are  taking  R  =  Z,  F  =  Q, 
and  K  equal  to  the  splitting  held  of  a  given  monic  irreducible  polynomial  F(X)  of 
degree  d  in  Z[X],  We  will  be  using  Theorem  9.64  for  various  choices  of  p  =  (p) 
in  Z  to  make  progress  on  identifying  Gal  ( K /Q).  In  order  to  identify  G  with  the 
subgroup  Gp  of  G,  wc  need  the  kernel  of  the  homomorphism  of  Gp  onto  G  to 
be  trivial.  From  the  previous  paragraph  we  know  that  the  condition  in  question 
is  that  e  =  1 .  We  postpone  to  Chapter  V  of  Advanced  Algebra  any  justification 
of  the  assertion  that  e  =  1  if  p  does  not  divide  the  discriminant  of  F(X). 

In  previous  sections  we  have  identified  Gal(W /Q)  in  some  cases  when  the 
Galois  group  is  relatively  small  compared  with  the  degree  d  of  the  polynomial. 
The  method  now  is  helpful  when  the  Galois  group  is  relatively  large  compared 
with  d. 

Let  us  be  sure  when  e  =  1  that  the  theorem  is  telling  us  not  only  that  G  p 
is  isomorphic  to  G  as  an  abstract  group,  but  also  that  the  cycle  structure  of  the 
elements  of  G  is  the  same  as  the  cycle  structure  of  the  elements  of  Gp.  For  this 
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purpose  we  ignore  the  proof  of  the  theorem  and  concentrate  only  on  the  statement. 
Assuming  that  p  does  not  divide  the  discriminant,  let  F(X)  be  the  reduction  of 
F(X)  modulo  p,  let  r\, . . . ,  rd  be  the  roots  of  F(X)  in  T,  and  let  f\, . . . ,  fd  be 
the  images  of  r\,  . . . ,  rd  under  the  quotient  homomorphism  T  — >  T / P .  The 
elements  r\,  ...,r a  are  distinct  since  p  does  not  divide  the  discriminant  of  FIX). 
Any  member  cr  of  G  =  Gal  ( K /Q)  permutes  r\,  ...  ,  rj  and  is  determined  by  the 
resulting  permutation  since  K  is  assumed  to  be  generated  by  r\ , . . . ,  />/ .  Under 
the  assumption  that  a  is  in  G  p,  a  descends  to  an  automorphism  if  of  T /  P .  This 
automorphism  o  acts  on  the  set  of  elements  r\, ...  ,fd,  permuting  them.  Since 
the  mapping  of  the  rf  s  to  the  rf  s  is  one-one,  the  resulting  permutation  of  the 
subscripts  1, . . . ,  d  is  the  same. 

When  p  varies,  we  cannot  match  the  elements  f\, ...  ,?d  for  one  value  of 
p  with  those  for  another  value  of  p,  because  we  have  no  direct  knowledge  of 
r\ ,  . . .  ,rd.  Thus  we  cannot  directly  compare  the  permutation  groups  G  that  we 
obtain  for  different  p’s.  But  at  least  we  know  their  cycle  structure. 

To  apply  the  theory,  we  factor  F{X )  quickly  with  a  symbolic  manipulation 
program,  and  we  obtain  the  Galois  group  of  a  splitting  field  of  F(  X)  by  inspection, 
together  with  the  cycle  structure  of  its  elements.  Specifically  an  irreducible  factor 
of  degree  m  contributes  an  m- cycle  for  the  element,  and  the  cycles  corresponding 
to  distinct  irreducible  factors  are  disjoint.  Then  we  put  together  the  information 
from  various  p's  and  see  what  elements  must  be  in  GaltA" /Q),  up  to  a  relabeling 
of  indices. 

Example  1.  F(X)  =  X5  —  X  —  1.  The  discriminant  is  D  =  2869  =  19  •  151. 
Thus  the  method  may  be  used  with  any  prime  number  other  than  19  and  151. 
Here  is  the  factorization  for  a  few  primes,  together  with  the  cycle  structure  within 


©5  for  a  generator  of  G: 

p  F(X)  Cycle  lengths 

2  (X2  +  X  +  1)(X3  +  X  +  1)  2,3 

3  X5  +  2X  + 2  5 

17  (X  +  9)(X+  11)(X3  +  14X2  +  12X  +  6)  1,1,3 

23  (A  +  9)(A4+ 14A3  +  12A2 +  7Z  +  5)  1,4 


For  comparison,  p  =  19  gives  F(X)  =  {X  +  6)2(X2  +  IX2  +  13A  +  10),  but 
we  cannot  use  this  prime  since  it  divides  the  discriminant.  It  is  enough  to  use 
the  information  from  p  =  2  and  p  =  3.  The  irreducibility  modulo  3  implies 
irreducibility  over  Q.  From  p  =  3,  we  obtain  a  5-cycle  in  Gal(W/Q).  From 
p  =  2,  we  obtain  the  product  of  a  2-cycle  and  a  3-cycle,  and  the  cube  of  this 
element  is  a  2-cycle.  In  the  example  in  Section  1 1  following  the  statement  of 
Theorem  9.44,  we  saw  in  effect  that  the  only  subgroup  of  ©5  containing  a  5-cycle 
and  a  2-cycle  is  ©5  itself.  Therefore  Gal(AT/Q)  =  ©5. 


538 


IX.  Fields  and  Galois  Theory 


Example  2.  F(X)  =  X5  +  10X3  -  10X2  +  35X  -  18.  The  discriminant 
is  D  =  3025000000  =  2658112,  a  perfect  square.  Thus  the  Galois  group  is  a 
subgroup  of  the  alternating  group  2I5.  The  method  using  reduction  modulo  p 
may  be  used  with  any  prime  other  than  2,  5,  and  11.  Here  is  the  factorization  for 
a  few  primes,  together  with  the  cycle  structure  within  S5  for  a  generator  of  G: 


P 


F(X) 


Cycle  lengths 


3  X(X  +  2)(X3  +  X2  +  2X+  1)  1.1,3 

7  X5  +  3X3  +  4X2  +  3  5 

17  (X  +  14)(X2  +  5X  +  14)(X2  +  15X  +  15)  1,2,2 


It  is  enough  to  use  the  information  from  p  =  3  and  p  =  7.  The  irreducibility 
modulo  7  implies  irreducibility  over  Q.  From  p  =  7,  we  obtain  a  5-cycle  in 
Gal(X/Q).  From  p  =  3,  we  obtain  a  3-cycle.  Any  5-cycle  and  any  3-cycle 
together  generate  all  of  2I5.  In  fact,  the  generated  subgroup  must  have  order 
divisible  by  15,  hence  must  have  order  15,  30,  or  60.  It  cannot  be  of  order  15 
because  every  group  of  order  15  is  cyclic  and  21s  has  no  elements  of  order  15.  It 
cannot  be  of  order  30  because  2I5  is  simple  and  subgroups  of  index  2  have  to  be 
normal.  Hence  it  is  all  of  2I5. 


Example  3.  Galois  group  6,1 .  Given  d  >  4,  let  us  see  how  to  form  an 
irreducible  F{X)  for  which  Gal ( K /Q)  is  all  of  &d-  For  any  degree  d  and  any 
prime  number  t,  there  exists  at  least  one  irreducible  monic  polynomial  of  degree 
d  in  F^[X|;  the  reason  is  that  the  finite  held  F  p  is  a  simple  extension  of  F?  by 
Corollary  9.19.  Fet  Hd.i{X)  be  such  a  polynomial  of  degree  d  for  l  =  2,  and  let 
Fid- 1,3  (X)  be  such  a  polynomial  of  degree  d  —  1  for  l  =  3.  Then  let  p  be  a  prime 
greater  than  <7,  and  let  Hs.piX)  be  an  irreducible  monic  polynomial  of  degree  2 
in  Fp[X],  We  can  regard  each  of  Hdp(X),  Hd~ i,3(X),  and  Hi.piX)  as  in  Z[X] 
by  reinterpreting  their  coefficients  as  integers.  Consider  the  congruences 


F[X]  =  Hd.2(X)  mod  (2), 

F[X]  =  XHd_ li3(X)  mod  (3), 

F[X]  =  (y{  (X  -  k))H2p{X)  mod  ( p ), 

k=0 


in  Z[X].  Since  the  sum  of  any  two  of  the  three  ideals  (2),  (3),  and  ( p)  of  Z[X]  is 
Z[X],  the  Chinese  Remainder  Theorem  (Theorem  8.27)  implies  that  there  exists  a 
simultaneous  solution  F[X]  to  these  congruences  in  Z[X],  and  we  may  take  F\X\ 
to  be  monic  of  degree  d.  Fet  K  be  a  splitting  held  for  F[X]  over  Q.  Our  method 
applies  to  the  primes  2,  3,  and  p  since  none  of  the  three  polynomials  has  any 
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repeated  factors.  The  result  of  applying  the  method  is  that  Gal  (A' /Q)  contains 
a  d-cycle,  a  (cl—  1 ) -cycle,  and  a  2-cycle.  Let  us  see  that  the  subgroup  generated 
by  these  three  elements  is  all  of  We  may  assume  that  the  (cl  —  l)-cycle  is 
(12  •••  d—l).  Without  loss  of  generality,  the  2-cycle  is  either  (1  /)  with  j  <  d 
or  is  ( k  d)  with  k  <  cl.  In  the  first  case  some  power  of  the  J-cycle  is  a  permutation 
r  with  r(l)  =  d\  if  a  denotes  the  2-cycle  (1  j).  then  Lemma  4.41  shows  that 
rat-1  is  the  2-cycle  (d  r  (_/)),  and  this  is  of  the  form  (k  cl)  with  k  <  d.  Thus 
we  may  assume  in  any  event  that  Gal(  W/Q)  contains  (1  2  •  •  •  d—l)  and  some 
2-cycle  ( k  d)  with  k  <  d.  Conjugating  ( k  d)  by  powers  of  (1  2  •  •  •  d—l),  we 
see  that  Gal ( K /Q)  contains  every  2-cycle  (k  d)  with  k  <  d.  For  1  <  k  <  d  —  1, 
we  then  find  that  Gal  ( K /Q)  contains 

(k  d)(k  +  1  d)(k  d)  =  (k  k+  1). 

So  GalfA'  /Q)  contains  (1  2),  (2  3), . . .  ,  (d—2  d—l),  and  we  have  already  seen 
that  it  contains  (d—l  d).  These  d—l  transpositions  generate  the  full  symmetric 
group,  and  therefore  Gal ( K /Q)  =  ©,/. 

18.  Problems 

1.  Take  as  known  that  the  polynomial  X 3  —  3X  +  4  is  irreducible  over  Q,  and  let 
r  be  a  complex  root  of  it.  In  the  field  Q(r),  find  a  multiplicative  inverse  for 
r2  +  r  +  1  and  express  it  in  the  form  ar 2  +  br  +  c  with  a,  b,  c  in  Q. 

2.  Suppose  that  R  is  an  integral  domain  and  that  F  is  a  subring  that  is  a  field,  so 
that  R  can  be  considered  as  a  vector  space  over  F .  Prove  that  if  dim/-'  R  is  finite, 
then  A  is  a  field. 

3.  Let  K  be  a  subfield  of  C  that  is  not  a  subfield  of  R.  Prove  that  K  is  topologically 
dense  in  C. 

4.  Let  K  =  k(.r )  be  a  transcendental  extension  of  the  field  Ik,  and  let  y  be  a  member 
of  K  that  is  not  in  Ik.  Prove  that  k(jc)  is  an  algebraic  extension  of  k(y). 

5.  What  is  a  necessary  and  sufficient  condition  on  an  integer  N  >  0  for  the  positive 
square  root  of  N  to  be  in  the  subfield  Q( i/2 )  of  M? 

6.  The  polynomials  F(X)  =  X3  +  X  +  1  and  G(Y)  —  Y3  +  Y2  +  1  are  irreducible 
over  F2.  Let  K  be  the  field  K  =  F2[X]/(F(X)),  and  let  L  be  the  field  L  = 
IF2[L]/(G(T)).  Since  K  and  L  are  two  fields  of  order  8,  they  must  be  isomorphic. 
Find  an  explicit  isomorphism. 

7.  Can  a  field  of  order  8  have  a  subfield  of  order  4?  Why  or  why  not? 

8.  If  K  is  a  finite  field,  prove  that  the  product  of  the  nonzero  elements  of  K  is  —  1 . 
(Educational  note:  When  K  is  Fp,  this  result  reduces  to  Wilson’s  Theorem,  given 
as  Problem  8  at  the  end  of  Chapter  IV.) 

9.  Suppose  that  K/lk  is  a  finite  extension  of  the  form  K  =  k(r)  with  [K  :  k]  odd. 
Prove  that  K  =  k(r2). 
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10.  Suppose  that  K/k  is  a  finite  extension  of  fields  and  that  K  =  k[r,  s],  Prove  that 
if  [k(r)  :  k]  is  relatively  prime  to  [k(s),  k],  then 

(a)  the  minimal  polynomial  of  r  over  k  is  irreducible  over  k(s), 

(b)  [K  :  k]  =  [k(r )  :  k]  [k(j)  :  k], 

11.  In  C,  let  /S  =  1/2,  co  =  j(— 1  +  V— 3),  and  a  —  co/3. 

(a)  Prove  for  all  c  in  Q  that  y  —  /3 + ca  is  a  root  of  some  sixth-degree  polynomial 
of  the  form  X6  +  aX3  +  b. 

(b)  Prove  that  the  minimal  polynomial  of  /3  +  a  over  Q  has  degree  3. 

(c)  Prove  that  the  minimal  polynomial  of  j3  —  a  over  Q  has  degree  6. 

12.  Suppose  that  k  is  a  finite  field  and  that  F(X)  is  a  member  of  k[X]  whose  derivative 
is  the  0  polynomial.  Prove  that  F(X)  is  reducible  over  k. 

13.  Let  k  be  a  field,  let  F(X)  be  a  separable  polynomial  in  k[X],  let  K  be  a  splitting 
field  of  F(X)  over  k,  and  let  n, ... ,  rn  be  the  roots  of  F(X )  in  K.  Regard 
Gal(K/k)  as  a  subgroup  of  the  symmetric  group  G„. 

(a)  Prove  that  Gal(K/k)  is  transitive  on  [r i, . . . ,  rn }  if  and  only  if  F(X)  is 
irreducible  over  k. 

(b)  Show  that  the  cyclotomic  polynomial  <t>8(X)  is  an  example  with  k  =  Q  and 
n  —  4  for  which  Gal  (K/k)  is  transitive  but  Gal  (K/k)  contains  no  4-cycle. 

(c)  Prove  that  if  n  is  prime  and  F(X )  is  irreducible  over  k,  then  Gal(K/k) 
contains  an  n -cycle. 

14.  Let  a\, .. .  ,an  be  relatively  prime  square-free  integers  >  2,  and  define  L k  = 

•  •  • ,  «Jak)  for 0  <  k  <  n. 

(a)  Show  for  each  k  that  [La  :  Q]  =  2l  with  0  <  I  <  k. 

(b)  Suppose  for  a  particular  k  that  [La  :  Q]  =  2k .  Exhibit  a  vector-space  basis 
of  La  over  Q,  and  describe  the  members  of  Gal(LA /Q)  by  telling  the  effect 
of  each  member  on  all  basis  vectors  of  La  over  Q. 

(c)  Suppose  for  a  particular  k  <  n  that  [La  :  Q]  =  2k .  Assume  that  ^/cik+i  lies 
in  La,  and  let  ^/aj~y\  be  expanded  in  terms  of  the  basis  of  (b).  Show  that 
application  of  the  members  of  Gal(LA /Q)  leads  to  a  contradiction. 

(d)  Deduce  that  [L„  :  Q]  =  2” . 

15.  Let  p  be  a  prime  number,  and  suppose  that  a  is  a  member  of  Q  such  that  Xp  —  a 
has  no  root  in  Q.  If  r  is  a  member  of  C  with  rp  —  a,  prove  that 

(a)  the  cyclotomic  polynomial  <1^(20  is  irreducible  in  Q(r), 

(b)  the  splitting  field  K  of  Xp  —  a  over  Q  has  degree  [K  :  Q]  =  p(p  —  1), 

(c)  the  Galois  group  Gal(K/Q)  is  isomorphic  to  a  semidirect  product  of  the 
multiplicative  group  of  Fp  and  the  additive  group  of  Fp ,  with  the  action  of 
a  member  m  of  the  multiplicative  group  on  the  members  n  of  the  additive 
group  being  given  by  m  (n )  =  mn. 
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16.  Let  F(X)  be  a  polynomial  in  k[X]  of  degree  n,  where  Ik  is  a  field  of  character¬ 
istic  0,  and  let  K  be  a  splitting  field  for  F(X)  over  Ik.  Prove  that  [K  :  Ik]  divides 
n\. 

17.  Let  Ik  be  a  field,  and  let  K  be  a  quadratic  extension  k (r),  where  r2  =  a  is  a 
member  of  k. 

(a)  If  k  has  characteristic  0,  determine  all  elements  of  K  whose  squares  are  in  k. 

(b)  What  happens  differently  if  the  characteristic  is  different  from  0? 

18.  Let  G  be  a  finite  group.  Show  that  there  exist  two  finite  extensions  k  and  K 
of  Q  such  that  K  is  a  Galois  extension  of  k  and  the  Galois  group  Gal  (K/k)  is 
isomorphic  to  G. 

19.  Let  K/k  be  a  finite  normal  extension.  For  F(X)  in  K[X]  and  a  in  Gal(K/k),  let 
Fa  (X)  be  the  result  of  the  substitution  homomorphism  K[X]  — »•  K[X]  carrying 
X  to  X  and  extending  the  action  of  a  on  K,  i.e.,  let  Fa  (X)  be  obtained  by  applying 
a  to  the  coefficients  of  F(X).  Prove  that  ricreGai(K/k)  Fa(X)  is  in  k[X], 

20.  Corollary  9.37  concerns  a  separable  algebraic  extension  K/k  and  a  finite  sub¬ 
group  H  of  Gal(K/k),  showing  that  K/Kw  is  a  finite  Galois  extension  with 
H  —  Gal(K/Kff)  and  [K  :  K22]  =  \H\.  By  going  over  its  proof,  obtain  the 
conclusion  that  if  {jci ,  . . . ,  x,,}  is  the  FI  orbit  of  x\  in  K,  then 

(a)  the  minimal  polynomial  of  x\  over  KH  is  n;=i  (x-xj). 

(b)  n  divides  \  H\. 

(c)  K  =  K”  (xi)  if  the  isotropy  subgroup  of  FI  at  x\  is  trivial. 

21.  Let  K  be  the  transcendental  extension  C(z)  of  C. 

(a)  Prove  that  any  linear  fractional  transformation  (p(z)  —  with  ad— be  ^  0 

in  C  extends  uniquely  to  a  C  automorphism  of  K. 

(b)  Let  FI  be  the  4-element  subgroup  of  Gal(K/C)  generated  by  the  extensions 
of  cr(z)  —  —z  and  r(z )  =  1  /z.  Show  that  w  —  z2  +  z~ 2  is  invariant  under 
FI,  and  conclude  that  every  member  of  C (w)  lies  in  KH . 

(c)  Applying  the  previous  problem  to  the  element  x\  =  z  of  K,  show  that  the 
minimal  polynomial  of  z  over  C (vS)  has  degree  4. 

(d)  Conclude  that  KH  —  C (z2  +  z~2)- 

22.  In  characteristic  0,  let  L/K  and  K/k  be  quadratic  extensions. 

(a)  Show  that  there  exists  an  irreducible  polynomial  F(X)  —  X4  +  bX2  +  c  in 
k[X]  such  that  F(r)  —  0  for  some  r  in  L. 

(b)  Show  that  the  element  r  in  (a)  has  L  =  k(r). 

(c)  Show  that  L  is  a  normal  extension  of  k  with  Galois  group  C2  x  C2  if  and 
only  if  c  is  a  square  in  k  for  some  polynomial  as  in  (a),  if  and  only  if  c  is  a 
square  in  k  for  every  polynomial  as  in  (a). 
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(d)  Show  that  Lisa  normal  extension  of  Ik  with  Galois  group  C4  if  and  only  if 
c~l(b2  —  4c)  is  a  square  in  k  for  some  polynomial  as  in  (a),  if  and  only  if 
c~l(b 2  —  4c)  is  a  square  in  k  for  every  polynomial  as  in  (a). 

(e)  Give  an  example  of  quadratic  extensions  L/K  and  K/k  in  characteristic  0 
such  that  L/k  is  not  normal. 

23.  Determine  Galois  groups  for  splitting  fields  over  Q  for  the  two  polynomials 
X3  -  3X  +  1  and  X3  +  X  +  1. 

24.  Suppose  that  F{X )  is  an  irreducible  cubic  polynomial  in  ©[X]  whose  splitting 
field  K  has  Gal(K/Q)  isomorphic  to  63.  What  are  the  possibilities,  up  to 
isomorphism,  for  the  Galois  group  of  a  splitting  field  of  (X3  —  l)F(X)  over  Q? 

25.  Let  K/k  be  a  finite  Galois  extension  whose  Galois  group  is  isomorphic  to  63. 
Is  K  necessarily  a  splitting  field  of  some  irreducible  cubic  polynomial  in  k[X]? 
Why  or  why  not? 

26.  Is  Cardan’s  cubic  formula  valid  for  finding  roots  of  reducible  cubics  X3  +  pX  +  q 
in  characteristic  0? 

27.  Prove  that  the  discriminant  of  a  real  cubic  with  distinct  roots  is  positive  if  all  the 
roots  are  real,  and  is  negative  if  two  of  the  roots  are  complex. 

28.  Let  F(X)  —  X3  +  pX  +  q  be  irreducible  in  Q[X],  and  suppose  that  X  —  r  is  a 
factor  for  some  r  in  C. 

(a)  Show  that  F(X)  factors inQ(r)[X]  as  F(X)  =  (X— r)(X2+rX+(r2  +  p)). 

(b)  We  know  that  Q(r)  is  a  splitting  field  for  F(X)  over  Q  if  and  only  if 
the  discriminant  —4 p3  —  27 q2  is  a  square  in  Q.  On  the  other  hand,  it  is 
evident  from  the  factorization  of  F (X)  that  it  splits  is  Q(r )  if  and  only  if  the 
discriminant  r 2  —  4(r2  +  p)  is  a  square  in  Q(r).  Show  by  a  direct  calculation 
that  these  two  conditions  are  equivalent. 

29.  Let  K  be  a  splitting  field  of  an  irreducible  cubic  polynomial  F(X)  in  Q[X],  If 
Gal(K/Q)  is  S3,  does  it  follow  that  K  contains  all  three  cube  roots  of  1?  Why 
or  why  not? 

30.  In  characteristic  0,  let  K  be  the  splitting  field  over  k  of  an  irreducible  polynomial 
in  k[X]  of  degree  5.  Assuming  that  the  discriminant  of  the  polynomial  is  a  square 
in  k,  what  are  the  possibilities  for  Gal(K/k)  up  to  a  relabeling  of  the  indices? 

31.  Determine  the  Galois  group  of  a  splitting  field  over  Q  for  the  polynomial 
X5  +  6X3  —  12X2  +  5X  —  4.  Use  of  a  computer  may  be  helpful  for  this 
problem. 

32.  The  proof  of  Theorem  9.64  introduced  a  positive  integer  e'  in  its  second  paragraph. 
Prove  that  e'  equals  the  integer  e\  in  the  statement  of  the  theorem. 
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33.  Let  R  be  a  Dedekind  domain,  let  F  be  its  field  of  fractions,  let  A"  be  a  finite 
separable  extension  of  F,  and  let  L  be  a  finite  separable  extension  of  K.  Let  T 
be  the  integral  closure  of  R  in  K,  and  let  U  be  the  integral  closure  of  R  in  L.  Let 
p,  P,  and  Q  be  nonzero  prime  ideals  in  R ,  T ,  and  U ,  respectively,  and  let  the 
ramification  indices  and  decomposition  degrees  for  the  extensions  L/K,  L/F, 
and  K  /  F  be 

e(Q\P),e(P\p),e(QW)  and  f(Q\P),f(P\v),f(Q\p). 

Prove  that 

e{Q\p)  =  e{Q\P)e{P\p)  and  f{Q\p)  =  f{Q\P)f{P\p). 


Problems  34-40  concern  norms  and  traces. 

34.  Let  m  be  a  square-free  integer,  and  let  N  and  Tr  denote  the  norm  and  trace  from 
Q(Vm )  to  Q. 

(a)  Show  that  N(a  +  b^Jrn  )  =  a2  —  mb 2  and  Tr(fl  +  b^fm  )  =  2a. 

(b)  Let  T  be  the  ring  of  algebraic  integers  in  Q (y/m).  It  was  shown  in  Section 

VIII. 9  that  T  consists  of  all  a  +  b^fm  with  a.  b  in  Z  if  m  =  2  mod  4  or 

m  =  3  mod  4,  and  of  all  a  +  b^Jm  with  a,b  in  Z  or  a,  b  in  Z  +  ^  if 

m  =  1  mod  4.  Prove  for  a  +  b^/m  in  Q(y/m  )  that  a  +  b^/m  is  in  T  if  and 
only  if  N(a  +  b*Jm  )  and  Tr  (a  +  b^/m  )  are  both  in  Z. 

(c)  Assume  that  a  +  b^fm  is  in  T .  Prove  that  N(a  +  b^fm  )  is  in  Zx  if  and  only 
if  a  +  b^/rn  is  in  T  x . 

(d)  For  m  —  2,  give  an  example  of  a  member  of  T  x  other  than  ±1. 

35.  For  the  extension  Q(  i/2 ) /Q,  find  the  value  of  the  norm  N  and  the  trace  Tr  on  a 
general  element  a  +  bl/ 2  +  c(\/ 2  )2  of  Q(\/2);  here  a,  b ,  c  are  in  Q. 

36.  Let  N(  ■ )  be  the  norm  relative  to  the  extension  Q(£)/Q>  where  f  is  a  primitive 
77  th  root  of  1. 

(a)  Show  that  N(1  —  £)  =  <t>„  (1),  where  <f>„  (A)  is  the  nth  cyclotomic  polynomial. 

(b)  Using  the  formula  Y\d\n  d>l  ^(A)  =  A"-1  +  A"-2  +  •  •  •  +  1,  show  that 

N(  1  —  £)  =  ( I  j  equals  p  if  n  is  a  power  of  the  positive  prime  p  and 

equals  1  if  n  is  divisible  by  more  than  one  positive  prime. 

37.  Let  p  >  0  be  a  prime  in  Z  of  the  form  4/?  +  1.  It  was  shown  in  Problem  31 
at  the  end  of  Chapter  VIII  that  such  a  prime  is  the  sum  of  two  squares.  This 
problem  gives  a  shorter  proof.  Take  as  known  from  Section  VIII. 4  that  the  ring 
Z[V— T  ]  of  Gaussian  integers  is  a  Euclidean  domain,  and  from  Problem  30  at 
the  end  of  Chapter  VIII  that  x2  =  —  1  mod  p  has  an  integer  solution  x.  Carry 
out  the  following  steps: 
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(a)  Write 

x  ±  >/— I  11  ,—r 

- =  —  x  ±  —  V  —  1. 

P  P  P 

If  p  were  prime  in  Z[V~  1  ],  then  it  would  follow  from  the  divisibility  of 
x1  +  1  by  p  that  p  divides  x  +  sf— T  or  p  divides  x  —  —  1.  Deduce  from 

the  displayed  equation  that  neither  alternative  is  viable,  and  conclude  that  p 
cannot  be  prime  in  Z[V— T  ]• 

(b)  Using  the  conclusion  of  (a)  to  write  p  as  a  nontrivial  product  in  Z[V~ 1] 
and  applying  the  norm  function,  prove  that  there  exist  integers  a  and  b  such 
that  p  =  a2  +  b 2 . 

38.  Let  p  >  0  be  a  prime  in  Z  of  the  form  8n  +  1.  Take  as  known  from  Problem 
13  at  the  end  of  Chapter  VIII  that  Z[  V~ 2  ]  is  a  Euclidean  domain,  and  from  the 
law  of  quadratic  reciprocity  (to  be  proved  in  Chapter  I  of  Advanced  Algebra ) 
that  x2  =  —2  mod  p  has  an  integer  solution  x.  Guided  by  the  argument  for  the 
previous  problem,  prove  that  there  exist  integers  a  and  b  such  that  p  =  a2  +  2b2. 

39.  Let  p  >  0  be  a  prime  in  Z  of  the  form  6n  +  1 .  Take  as  known  from  Problem 

26  at  the  end  of  Chapter  VIII  that  Z[j(l  +  — 3  )]  is  a  Euclidean  domain,  and 

from  the  law  of  quadratic  reciprocity  (to  be  proved  in  Advanced  Algebra)  that 
x2  =  —  3  mod  p  has  an  integer  solution  x.  Guided  by  the  argument  for  the 
previous  problem,  prove  that  there  exist  integers  a  and  b  such  that  p  =  a2  +  3 b2. 

40.  Let  k  C  L  C  L'  be  fields  such  that  L'/k  is  a  finite  separable  extension.  Using 
Corollary  9.58,  prove  that  the  norm  and  trace  satisfy 

AIjL'/k  =  A^L/k  °  A^l'/l  and  TrjL'/k  =  TrjL/k  °  Triy/L  ■ 

Problems  41-45  make  use  of  the  theory  of  symmetric  polynomials,  which  was  intro¬ 
duced  in  Problems  36-39  at  the  end  of  Chapter  VIII. 

41.  Let  k  be  a  field,  let  F(X )  be  a  polynomial  in  k[V],  let  K  be  an  extension  field 

in  which  F(X)  splits,  and  let  r\ . r„  be  the  roots  of  F(X)  in  K,  repeated 

according  to  their  multiplicities.  If  P(X\ ,  . . . ,  X„)  is  a  symmetric  polynomial 
in  k[Xi, . . . ,  Xn],  prove  that  P(r\ , ,rn)  is  a  member  of  k. 

42.  Let  k  be  a  field,  let  F(X)  and  G  ( X )  be  polynomials  over  k,  let  K  be  an  extension 
field  in  which  F(X)  and  G( X)  both  split,  and  let  r i, . . . ,  rm  and  si, . . .  ,s„ 
be  the  respective  roots  of  F(X)  and  G( X )  in  K,  repeated  according  to  their 
multiplicities.  Deduce  from  the  previous  problem  that  the  polynomials 

tn  n  fh  n 

Hi(X)  =  IT  FI  (*  -  n  -  sj )  and  IFJX)  =  IT  IT  (*  “  Wj) 

i= 1  j= 1  i= 1  7=1 


lie  in  k[X], 
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43.  (a)  Find  a  nonzero  polynomial  with  rational  coefficients  having  ~J2  +  V3  as  a 

root.  What  is  the  minimal  polynomial  of  \fl  +  V3  over  Q? 

(b)  Find  a  nonzero  polynomial  with  rational  coefficients  having  ~J2  +  1/2  as  a 
root.  What  is  the  minimal  polynomial  of  \/2  +  1/2  over  Q? 

44.  Let  I  be  a  field  of  characteristic  0,  and  let  K  =  k(ri,  . . . ,  rn)  be  the  field  of 
fractions  of  the  polynomial  ring  k[ri, . . .  ,r„]  in  n  indeterminate  s.  Show  that 
any  a  in  the  symmetric  group  G„  defines  a  member  of  Gal(K/k)  such  that 
cr(rj)  —  ra(j)  for  all  a  in  6n.  Then  define  F(X)  to  be  the  polynomial 

F(X)  =  (X-n)...(Z-r„) 

in  K[X],  and  show  that 

(a)  F(X)  is  irreducible  over  the  fixed  field  K®” , 

(b)  K  is  a  splitting  field  for  F(X )  over  K®" , 

(c)  K®"  =  k(u  i ,  . . . ,  un),  where  u\ , . . . ,  un  are  given  by 

Ml  =  Er/>  «2  =  J/  rirj,  Un  =  n  n  > 

i  i  <  j  i 

(d)  the  Galois  group  of  the  splitting  field  of  F(X)  over  k(r/ 1 , . . . ,  un)  is  G„. 

45.  (Cubic  resolvent)  This  problem  carries  out  one  step  in  finding  the  roots  of  an  ar¬ 
bitrary  quartic  polynomial.  Let  k  be  a  field  of  characteristic  0,  let  K  =  k(p,  q,  r) 
be  the  field  of  fractions  of  the  polynomial  ring  k[/>,  q ,  r]  in  n  indeterminates, 
and  let  L  be  a  splitting  field  of  the  polynomial 

F(X)  =  X4  +  pX2  +qX  +  r 

in  K[X],  The  Galois  group  Gal(L/K)  is  G4  by  the  previous  problem.  Let 
B4  =  {(1),  (1  2) (3  4),  (1  3) (2  4),  (1  4) (2  3)}.  In  the  composition  series 
©4  3  24  ^  B4  D  {(1),  (1  2)}(3  4)}  3  {1},  Proposition  9.63  shows  that  the 
fixed  field  of  24  is  K.(*/D),  where  D  is  the  discriminant.  To  obtain  the  fixed 
field  of  B4,  we  adjoin  to  K(VZ))  an  element  of  L  invariant  under  B4  but  not 
under  24-  If  4| ,  i'2,  53,  44  denote  the  roots  of  F(X)  in  L,  then  such  an  element  is 
(v  1  +  S2)(S3  +  S4).  Its  three  conjugates  under  2I4/B4  are 

0\  =  (si  +S2KS3  +  *4), 

0l  —  (it  +  Sl)(S2  +  S4). 

O3  =  (it  +  S/)(S2  +  S3), 

which  are  the  three  roots  of  the  “cubic  resolvent”  polynomial 

6 3  —  c\02  +  C2O  —  C3, 


where  ci ,  C2,  C3  are  the  elementary  symmetric  polynomials  in  61,62,  63  given  by 
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C\  =  C2  —  fyfy’  C'3  =  (I 

i  i  <  j  i 

(a)  Show  that  c i,  C2,  C3  are  symmetric  polynomials  in  34 ,  S2,  S3,  S4,  hence  are 
polynomials  in  the  coefficients  p,  q,r. 

(b)  Verify  that  ci  =  2p,  C2  —  p2  —  4 r,  and  C3  =  q2. 

(c)  Show  that  the  discriminant  of  the  cubic  resolvent  equals  the  discriminant  of 
the  original  quartic  polynomial. 

Problems  46-50  concern  Galois  groups  of  splitting  fields  of  quartic  polynomials.  Take 
as  known  that  the  discriminant  of  a  quartic  polynomial  F(X)  —  X4  +  pX2  +qX  +  r 
is  given  by 


—4p3q2  —  27  <7 4  +  I6p4r  +  I44pq2r  —  128/32r2  +  256r3. 

Let  K  be  a  splitting  field  for  F(X)  over  Q,  and  let  G  —  Gal(K/Q).  Regard  G  as  a 

subgroup  of  the  symmetric  group  64. 

46.  (a)  Identify  all  transitive  subgroups  of  the  alternating  group  24 ,  up  to  a  relabeling 

of  the  four  indices. 

(b)  Identify  all  transitive  subgroups  of  the  symmetric  group  G4  other  than  those 
in  (a),  up  to  a  relabeling  of  the  four  indices. 

47.  Suppose  q  =  0. 

(a)  Show  that  G  is  a  subgroup  of  24  if  and  only  if  r  is  a  square  in  Q. 

(b)  Show  by  solving  F(X)  =  0  explicitly  that  [K  :  Q]  is  a  power  of  2,  and 
conclude  that  G  has  no  element  of  order  3. 

(c)  Deduce  when  r  is  a  square  that  G  —  {(1),  (1  2)(3  4),  (1  3)(2  4),  (1  4)  (2  3)} 
if  F(X)  is  irreducible  over  Q. 

(d)  Deduce  when  r  is  a  nonsquare  that  G  is  cyclic  of  order  4  or  is  dihedral  of 
order  8  if  F(x)  is  irreducible  over  Q;  in  the  dihedral  case,  G  is  generated  by 
a  4-cycle  and  the  group  listed  in  (c).  (Problem  22  shows  how  to  distinguish 
between  the  two  cases.) 

48.  For  F(X)  =  X4  +  X  +  1,  show  by  considering  reduction  modulo  2  and  modulo 

3  that  G  =  S4. 

49.  Let  F(X)  =  X4  +  8X  +  12. 

(a)  Compute  the  discriminant  of  F(X),  and  verify  that  it  is  a  square. 

(b)  Show  that  F(X)  =  (l  +  X)(2  +  X  +  4X2  +  X3)  mod  5  with  the  two  factors 
on  the  right  side  irreducible  in  F5 . 

(c)  Show  from  (a)  and  (b)  that  if  F(X)  is  reducible  over  Q,  then  it  must  have  a 
root  that  is  an  integer.  Check  that  there  is  no  such  root. 

(d)  Conclude  that  G  —  214. 
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50.  For  each  transitive  group  G  as  in  Problem  46,  find  a  polynomial  F(X )  of  degree  4 
over  Q  whose  splitting  field  K  over  Q  has  Gal(K/Q)  isomorphic  to  G. 


Problems  51-56  continue  the  introduction  to  error-correcting  codes  begun  in  Problems 
63-73  at  the  end  of  Chapter  IV  and  continued  in  Problems  25-28  at  the  end  of  Chapter 
VII.  The  current  problems  will  not  make  use  of  the  problems  in  Chapter  VII.  As  in 
the  problems  in  Chapter  IV,  we  work  with  the  field  F  =  Z/2Z,  with  Hamming  space 
F",  and  with  linear  codes  C  in  F".  The  minimal  distance  of  C  is  denoted  by  S(C). 
Problem  72  in  Chapter  IV  introduced  cyclic  redundancy  codes,  which  are  determined 
by  a  generating  polynomial  G(X)  of  some  degree  g  suitably  less  than  n.  Such  a  code 
C  is  built  from  all  polynomials  G(X)B(X)  with  B(X)  —  0  or  deg  B(X)  <  n  —  g—  1. 
A  given  polynomial  co  +  c\X  +  ■  ■  ■  becomes  the  n-tuple  (co,  ci, . . . )  of  C;  the  code 
C  has  dimension  n  —  g.  This  set  of  problems  will  discuss  a  special  class  of  cyclic 
redundancy  codes  called  cyclic  codes,  and  then  a  special  subclass  called  BCH  codes. 

51.  A  linear  codeC  inF"  is  called  a  cyclic  code  if  whenever  (co,  ci, . . . ,  c„_i)  is  in 

C,  then  so  is  (c„_i,  Co,  ci, . . . ,  c„_ 2). 

(a)  Prove  that  a  linear  code  C  is  cyclic  if  and  only  if  the  set  of  all  polynomials 
co  +  c \X  +  ■  •  ■  +  cn-\Xn~x  corresponding  to  members  (co,  c\, . . . ,  c„_  1) 
of  C  is  an  ideal  in  the  ring  F[X]/(X”  —  1).  (In  this  case  the  members  of  C 
will  be  identified  with  the  set  of  such  polynomials.) 

(b)  Prove  that  if  C  is  cyclic  and  nonzero,  then  there  exists  a  unique  G ( X )  in 
C  of  lowest  possible  degree.  Moreover,  G(X)  divides  X"  —  1  in  F[X], 
and  C  consists  exactly  of  the  polynomials  G(X)F(X)  mod  ( X "  —  1)  such 
that  F(X)  —  0  or  deg  FIX)  <  n  —  deg  G(X)  —  1,  and  C  has  dimension 
n  —  deg  G(X).  (The  polynomial  G ( X )  is  called  the  generating  polynomial 
of  C.  A  cyclic  code  C  over  the  field  Z/2Z  having  block  length  n  and 
dimension  k  is  called  a  binary  cyclic  (n,  k)  code.) 

(c)  Prove  that  if  G  ( X )  has  degree  n  —  k,  then  a  basis  of  C  consists  of  the 
polynomials  G(X),  XG(X),  X2G(X ), . . . ,  Xk~lG(X). 

(d)  Under  the  assumption  that  C  is  cyclic  and  nonzero,  (b)  says  that  it  is  possible 
to  write  Xn  —  1  =  G{X)H(X)  for  some  H(X)  in  F| X\.  Prove  that  a 
member  B(X)  of  F[X]/(V"  —  1)  lies  in  C  if  and  only  if  H(X)B(X )  = 
0  mod  ( Xn  -  1). 

/ 1  0  0  1  0  1 1  \ 

52.  (a)  Show  that  the  row  space  C  of  the  matrix  Q  —  (0101110)  is  a  cyclic 

Vo  0  1  0  1  1  1  / 

(7,  3)  code  with  generating  polynomial  G(X)  =  1  +  X2  +  X3  +  X4. 

(b)  Show  directly  from  Q  that  C  has  minimal  distance  8  —  A. 

(c)  The  polynomial  H (X)  —  1  +  X2  +  X2  has  the  property  that  G(X)H (X)  = 
X1  —  1  in  F[X],  Find  a  4-by-7  matrix  Ti  such  that  the  column  vectors  v  e  F7 
that  lie  in  C  are  exactly  the  ones  with  Hv  —  0. 
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(d)  The  matrix  H  in  (c)  is  called  the  check  matrix  for  the  code.  Describe  a 
procedure  for  constructing  the  check  matrix  when  starting  from  a  general 
binary  cyclic  (n,  k)  code  whose  generating  polynomial  G ( X )  is  known  and 
whose  polynomial  H(X )  with  G(X)H(X )  =  Xn  —  1  is  known.  Prove  that 
the  procedure  works. 

53.  Show  that  X"  —  1  is  a  separable  polynomial  over  F  if  n  is  odd  but  not  if  n  is  even. 

54.  Let  C  be  a  binary  cyclic  (n,  k)  code  with  generating  polynomial  G (X),  and 
suppose  that  n  is  odd.  Let  I  be  a  finite  extension  field  of  F  in  which  Xn  —  1 
splits,  and  let  a  be  a  primitive  nlh  root  of  1,  he.,  a  root  of  Xn  —  1  in  K  such  that 
oim  ^  1  for  0  <  m  <  n.  Suppose  that  r  and  s  are  integers  with  0  <  s  <  n  and 

G(ar)  =  G(c/+1)  =  •  •  •  =  G(ar+S)  =  0. 

(a)  Let  P(X)  =  G(X)F(X )  with  F(X)  /  0  and  degF  <  k  be  an  arbitrary 
nonzero  member  of  C,  so  that  P(a')  —  P(a' +1)  =  •  •  ■  =  P(ar+S)  —  0. 
Write  P(X )  =  Co  +  c\X  +  •  ■  ■  +  c„_ \Xn~l ,  and  use  the  values  of  P(a') 
for  r  <  j  <  r  +  s  to  set  up  a  homogeneous  system  of.v  +  I  linear  equations 
with  n  unknowns  co,  . . . ,  c„_i . 

(b)  Using  an  argument  with  Vandermonde  determinants,  show  that  every  (s+ 1  )- 
by-(5+ 1)  submatrix  of  the  coefficient  matrix  of  the  system  in  (a)  is  invertible. 

(c)  Obtain  a  contradiction  from  (b)  if  .v  +  1  or  fewer  of  the  coefficients  of  P(X) 
are  nonzero. 

(d)  Conclude  that  the  minimal  distance  5(C)  is  >  s  +  2. 

55.  (BCH  codes,  or  Bose-Chaudhuri-Hocquenghem  codes)  Let  n  be  an  odd 

positive  integer,  let  e  be  a  positive  integer  <  n/ 2,  let  K  be  a  finite  extension 
field  of  F  in  which  Xn  —  1  splits,  and  let  a  he  a  primitive  nth  root  of  1  in 
K.  For  1  <  j  <  2e,  let  Fj  ( X )  be  the  minimal  polynomial  of  a'1  over  F,  and 
define  G(X)  =  (1  +  X)  LCM(Fi  (X),  . . . ,  F2e(X)).  Prove  that  G(X)  divides 
Xn  —  l  and  that  G(X )  is  the  generating  polynomial  for  a  cyclic  code  C  in  F" 
with  minimal  distance  5(C)  >  2e  +  2.  (Educational  note:  Therefore  C  has  the 
built-in  capability  of  correcting  at  least  e  errors.) 

56.  In  the  setting  of  the  previous  problem,  let  n  —  2m  —  1  for  a  positive  integer  m , 
and  let  K  be  a  field  of  order  2"' . 

(a)  Prove  that  any  irreducible  polynomial  in  F[X]  with  a  root  in  K  has  order 
dividing  m,  and  conclude  that  the  order  of  the  generating  polynomial  G{X) 
in  the  previous  problem  is  at  most  2 em  +  1. 

(b)  Prove  that  there  exists  a  sequence  C,  of  binary  cyclic  («,• ,  kr )  codes  of  BCH 
type  such  that  kr/nr  tends  to  1  and  the  minimal  distance  S(Cr)  tends  to 
infinity.  (Educational  note:  The  fraction  kr/nr  tells  the  fraction  of  message 
bits  to  total  bits  in  each  transmitted  block.  Thus  the  problem  says  that  there 
are  linear  codes  capable  of  correcting  as  large  a  number  of  errors  as  we 
please  while  having  as  large  a  percentage  of  message  bits  as  we  please.) 
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57.  Take  as  known  that  F\  (X)  =  1  +  X  +  X4  is  irreducible  over  F.  Let  K  be  the 
field  F[X]/(/ri  (X))  of  order  16,  and  let  a  be  the  coset  X  +  (F\  (X))  in  K. 

(a)  Explain  why  F\(X)  factors  as  F\(X)  =  (X  —  a)(X  —  a2)(X  —  a4)(X  —  a8) 
over  K. 

(b)  Find  the  minimal  polynomial  Fs(X)  of  a3. 

(c)  Show  in  F15  that  the  binary  cyclic  code  C  with  generating  polynomial 
G(X)  =  (1  +  X)F\  (X)Fi(X)  has  dimC  =  6  and  8(C)  >  6. 

Problems  58-63  combine  Problems  12-13  in  Chapter  V  with  the  notion  of  extension 
of  scalars  from  Chapter  VI  and  some  Galois  theory  from  Chapter  IX  to  prove  the 
general  Jordan-Chevalley  decomposition.  Let  Ik  be  a  field,  and  let  V  be  a  finite¬ 
dimensional  vector  space  over  k.  A  linear  map  N  :  V  — >  V  is  called  nilpotent  if 
Nk  =  0  for  some  k.  A  linear  map  S  :  V  — >  V  is  called  semisimple  if  there  is 
some  finite  extension  K  of  k  for  which  the  linear  map  SK  :  VK  — »■  VK  obtained  by 
extension  of  scalars  has  a  basis  of  eigenvectors.  The  theorem  is  that  if  L  :  V  — »•  Lisa 
linear  map  with  the  property  that  every  irreducible  factor  of  the  minimal  polynomial 
of  L  over  k  is  separable,  then  L  has  a  unique  decomposition  L  =  S  +  N  with  S 
semisimple,  N  nilpotent,  and  S N  =  NS.  The  theorem  applies  without  restriction 
to  a  linear  L  :  V  —>  V  if  k  is  finite  or  has  characteristic  0  because  the  separability 
condition  is  automatically  satisfied  in  these  cases. 

58.  Let  k  be  a  field,  let  V  be  a  vector  space  over  k,  and  let  K  be  an  extension  field  of 
k.  Extend  scalars  to  form  the  K  vector  space  given  by  VK  =  V  K,  and  let 
Gal(K/k)  act  on  VK  by  saying  that  <p(v  ®  c)  —  v  ®  <p(c)  for  q>  in  Gal(K/k)  and 
v  ®  c  in  VK.  Explain  for  V  —  k"  that  VK  may  be  interpreted  as  K"  and  that  the 
action  by  (p  reduces  to  (< p(u))j  =  (p(uj). 

59.  Let  k  be  a  field,  let  V  be  a  finite-dimensional  vector  space  over  k,  and  let 
L  :  V  — >  V  be  a  linear  map.  Suppose  that  every  irreducible  factor  of  the  minimal 
polynomial  of  L  over  k  is  separable.  Prove  the  existence  of  a  Jordan-Chevalley 
decomposition  of  L  by  following  these  steps: 

(a)  Let  K  be  a  splitting  field  of  k,  so  that  K  is  a  finite  Galois  extension  of  k.  Use 
Problems  12-13  of  Chapter  V  to  show  that  L  ®  1  :  VK  — »■  VK  has  a  unique 
decomposition  as  a  sum  S  +  Af  of  K  linear  maps  of  V¥‘  to  itself  such  that 
SAT  =  MS,  Mis  nilpotent,  and  S  has  a  basis  of  eigenvectors. 

(b)  Prove  that  any  K  linear  T:  V1'*  — »•  VK  such  that  (1  ®cp)T—  T(1  ®q>)  for  all 
q>  e  Gal(K/k)  is  of  the  form  T—  T  ®  1  for  a  unique  k  linear  T  :  V  — »•  V . 

(c)  Show  that  the  K  linear  maps  S  and  M of  (a)  satisfy  (1  ®  < p)S  —  5(1  ®  <p) 
and  (1  ®  (p)M  —  M(  1  ®  ( p)  for  all  q>  e  Gal(K/k),  and  deduce  from  (b)  that 
S  and  M may  be  written  as  S  —  S  <S>  1  and  M  —  N  ®  1  for  uniquely  defined 
k  linear  maps  S  and  A  of  V  into  itself. 

(d)  Show  that  S  is  semisimple,  N  is  nilpotent,  and  SN  =  NS,  and  conclude 
that  L  —  S  +  N  is  a  Jordan-Chevalley  decomposition  of  L. 
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(e)  Show  that  S  and  N  are  polynomials  in  L. 

60.  Let  Ik  be  a  field,  let  V  be  a  finite-dimensional  vector  space  over  k,  and  let 
L  :  V  — >•  V  be  a  linear  map.  Prove  the  uniqueness  result  that  there  is  at  most 
one  decomposition  L  —  S  +  N  with  S  semisimple,  N  nilpotent,  and  SN  =  NS. 

61.  Let  k  =  R.  and  let  L  :  R4  R4  be  the  linear  map  defined  by  the  matrix 


A  = 


o 

l 

o  - 

i 


The  minimal  polynomial  of  L  or  A  is  (X2  +  l)2.  Calculate  the  Jordan-Chevalley 
decomposition  of  L  in  matrix  form. 

62.  Let  F2  be  a  field  of  two  elements,  and  let  k  =  F2C*),  where  x  is  transcendental 
over  F2.  Let  L  :  k2  —*■  k2  be  the  linear  map  defined  by  the  matrix  A  —  (  ^  q )• 

The  characteristic  polynomial  of  L  or  A  is  M(X)  —  X2  —  x.  This  is  irreducible 
over  k  and  hence  is  also  the  minimal  polynomial.  The  quadratic  extension 
K  =  k[x1//2]  of  k  is  a  splitting  field  for  M{X ),  and  M(X)  has  a  double  root  in 
k[x1/2]. 

(a)  Show  that  A,  regarded  as  a  matrix  in  ALOK),  does  not  have  a  basis  of 
eigenvectors.  Conclude  that  L  is  not  semisimple. 

(b)  Calculate  the  most  general  2-by-2  matrix  commuting  with  A,  and  show  that 
it  cannot  have  characteristic  polynomial  X2  unless  it  is  the  0  matrix. 

(c)  Conclude  that  L  cannot  have  a  Jordan-Chevalley  decomposition. 

63.  Let  k  be  a  field,  let  V  be  a  finite-dimensional  vector  space  over  k,  and  let 
L  :  V  — »■  V  be  an  invertible  linear  map.  Suppose  that  every  irreducible  factor 
of  the  minimal  polynomial  of  L  over  k  is  separable.  A  linear  map  U  :  V  — »•  V 
is  called  unipotent  if  (U  —  I)k  =  0  for  some  k.  By  suitably  adjusting  the  proof 
of  the  Jordan-Chevalley  decomposition,  prove  that  there  exist  linear  maps  S  and 
U  of  V  into  itself  such  that  S  is  semisimple,  U  is  unipotent,  and  L  =  SU  —  US. 

Problems  64-73  introduce  ordered  fields,  formally  real  fields,  and  real  closed  fields. 
An  ordered  field  k  is  a  field  with  a  specified  subset  P  of  “positive”  elements  that  is 
closed  under  addition  and  multiplication  and  is  such  that  each  nonzero  element  of  k 
is  in  exactly  one  of  P  and  —  P.  The  fields  Q  and  R  are  examples.  A  formally  real 
field  k  is  a  field  in  which  —  1  is  not  the  sum  of  squares.  A  real  closed  field  k  is  a 
formally  real  field  such  that  no  proper  algebraic  extension  of  k  is  formally  real.  The 
problems  together  prove  the  existence  part  of  the  Artin-Schreier  Theorem:  If  k  is 
an  ordered  field  with  P  as  its  set  of  positive  elements  and  if  k  is  an  algebraic  closure, 
then  there  exists  a  real  closed  field  K  between  k  and  k  that  is  an  ordered  field  with  P 
contained  in  its  set  of  positive  elements.  Moreover,  K  is  unique  up  to  k  isomorphism, 
and  k  is  of  the  form  K(V— 1  )• 
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64.  Verify  the  following  properties  of  an  ordered  field  k  when  P  is  the  set  of  positive 
elements: 

(a)  1  is  in  P , 

(b)  every  nonzero  square  is  in  P, 

(c)  whenever  a  is  in  P,  then  so  is  a-1, 

(d)  k  is  formally  real, 

(e)  k  has  characteristic  0. 

65.  In  an  ordered  field  k  whose  set  of  positive  elements  is  P,  define  x  >  y  and  y  <  x 
to  mean  x  —  y  is  in  P.  Let  a ,  b.  c,  d  be  in  k.  Check  the  following: 

(a)  exactly  one  the  relations  a  >  b,  a  —  b,  and  a  <  b  holds, 

(b)  if  a  >  b  and  b  >  c,  then  a  >  c, 

(c)  if  a  >  b,  then  a  +  c  >  b  +  c, 

(d)  if  a  >  b  and  c  >  0,  then  ac  >  be, 

(e)  if  a  >  b  >  0,  then  b  1  >  a  1 , 

(f)  if  a  >  b  >  0  and  c  >  d  >  0,  then  ac  >  bd, 

(g)  if  a  >  b  and  c  >  d,  then  ac  +  bd  >  ad  +  be. 

66.  Let  k  be  an  ordered  field  with  P  as  its  set  of  positive  elements,  let  k(;t)  be  a 
transcendental  extension,  and  define  the  positive  elements  of  k(x)  to  be  those 
for  which  the  quotient  of  the  leading  coefficient  of  the  numerator  by  the  leading 
coefficient  of  the  denominator  is  in  P.  Show  that  with  this  definition  of  the  set 
of  positive  elements,  k(;t)  becomes  an  ordered  field  in  which  x  >  n  for  every 
positive  integer  n .  (Then  also  1  / n  >  \/x  for  every  positive  integer  n  by  Problem 
65e.) 

67.  (a)  Show  that  Q(V2 )  becomes  an  ordered  field  in  two  distinct  ways. 

(b)  If  k  is  an  ordered  field  with  P  as  its  set  of  positive  elements  and  if  c  is  a 
member  of  P  that  is  not  a  square,  show  that  there  are  two  ways  of  defining 
the  set  of  positive  elements  P'  of  K  =  k(,yc )  so  that  K  becomes  an  ordered 
field  with  PCP'. 

68.  Let  k  be  an  ordered  field,  and  let  K  be  the  extension  that  arises  by  adjoining  the 
square  roots  of  all  the  positive  elements  of  K.  Prove  that  K  is  a  formally  real 
field  by  carrying  out  the  following  steps: 

(a)  Show  that  if  n  is  chosen  as  small  as  possible  so  that  an  equation  —  1  = 
X!/=i  Pj%?  holds  in  K  with  all  pj  positive  in  k  and  all  in  an  extension 
kCy/ci" ,  •  •  •  s  ~jcn  )  of  k  with  all  cj  positive  in  k,  then  writing 

k(^/cq" , . . . ,  ^Cyt )  =  k(v/cT , . . . ,  cn—  i  ) ( c„  ) 
leads  to  an  equation 

k  k  k 

-1  =  E  Pja]  +  L  Picnb]  +  2 ^  J2  Pjajbj  (*) 

7=1  7=1  7=1 

in  which  aj  and  bj  are  in  kC^/cT ,  . . . ,  ^/c„-  \  ). 
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(b )  Consider  the  third  term  on  the  right  side  of  ( * ) ,  and  show  that  a  contradiction 
results  if  this  term  is  0  and  a  different  contradiction  arises  if  this  term  is  not  0. 

69.  Let  Ik  be  a  formally  real  field,  and  let  k  be  an  algebraic  closure.  Show  that  there 
exist  maximal  formally  real  subfields  of  k  containing  k,  and  show  that  any  such 
is  a  real  closed  field. 

70.  Carry  out  the  following  steps  to  show  that  a  real  closed  field  k  becomes  an  ordered 
field  in  one  and  only  one  way: 

(a)  Suppose  that  c  ^  0  is  not  a  square,  hence  that  kf^/c )  is  a  quadratic  extension 
ofk.  Why  is— 1  =  ^”=1(fl;-+fcy^/c)2  for  suitable  members  aj  and  bj  ofk? 

(b)  By  expanding  the  identity  in  (a),  show  that  c  is  not  a  sum  of  squares.  In 
other  words,  every  sum  of  squares  in  k  is  a  square  in  k. 

(c)  Solve  for  c  in  the  expansion  in  (b),  and  conclude  that  —  c  is  a  square. 

(d)  Conclude  from  the  previous  steps  that  the  choice  of  P  as  the  set  of  nonzero 
squares  makes  k  into  an  ordered  field  and  that  there  no  other  possible  defi¬ 
nition  for  the  set  P  of  positive  elements  that  makes  k  into  an  ordered  field. 

71.  Carry  out  the  following  steps  to  show  that  in  any  real  closed  field  k,  every 
polynomial  of  odd  degree  has  a  root: 

(a)  Show  by  induction  that  it  is  enough  to  handle  irreducible  polynomials  of 
odd  degree. 

(b)  For  an  irreducible  polynomial  Q(X)  of  odd  degree  n,  let  k(a)  be  a  simple 
algebraic  extension  ofk  such  that  Q(a )  =  0.  Show  that  an  expression  of  —  1 
as  a  sum  of  squares  in  k(a)  forces  an  identity  ^/=1  Rj(X)2  +  Q(X)A(X)  — 
—  1  for  suitable  polynomials  Rj{X )  in  k[ A-]  of  degree  <  n  —  1  and  some 
polynomial  A(X)  in  k[X]  of  odd  degree  <  n  —  2. 

(c)  If  r  is  a  root  of  the  polynomial  A(X )  in  (b),  show  that  ]L/=i  Rj{r)2  —  —  1, 
and  deduce  a  contradiction. 

72.  By  using  the  results  of  Problems  70-71  and  taking  into  account  the  proof  of 
Theorem  1.18  that  appears  in  Section  IX.  10,  prove  that  if  k  is  a  real  closed  field, 
then  k(>/— T )  is  algebraically  closed. 

73.  Put  the  above  results  together  to  give  a  proof  of  the  existence  in  the  Artin-Schreier 
Theorem:  if  an  ordered  field  k  has  P  as  its  set  of  positive  elements  and  k  as  an 
algebraic  closure,  then  there  exists  a  real  closed  field  K  with  k  C  K  C  k  such 
that  k  =  K(V~ T )  and  such  that  P  is  contained  in  the  set  of  squares  in  k,  i.e., 
such  that  the  set  of  positive  elements  in  the  natural  ordered-field  structure  on  k 
contains  P . 
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Abstract.  This  chapter  contains  two  sets  of  tools  for  working  with  modules  over  a  ring  R  with 
identity.  The  first  set  concerns  finiteness  conditions  on  modules,  and  the  second  set  concerns  the 
Horn  and  tensor  product  functors. 

Sections  1-3  concern  finiteness  conditions  on  modules.  Section  1  deals  with  simple  and  semisim¬ 
ple  modules.  A  simple  module  over  a  ring  is  a  nonzero  unital  module  with  no  proper  nonzero 
submodules,  and  a  semisimple  module  is  a  module  generated  by  simple  modules.  It  is  proved  that 
semisimple  modules  are  direct  sums  of  simple  modules  and  that  any  quotient  or  submodule  of  a 
semisimple  module  is  semisimple.  Section  2  establishes  an  analog  for  modules  of  the  Jordan-Holder 
Theorem  for  groups  that  was  proved  in  Chapter  IV;  the  theorem  says  that  any  two  composition  series 
have  matching  consecutive  quotients,  apart  from  the  order  in  which  they  appear.  Section  3  shows 
that  a  module  has  a  composition  series  if  and  only  if  it  satisfies  both  the  ascending  chain  condition 
and  the  descending  chain  condition  for  its  submodules. 

Sections  4-6  concern  the  Horn  and  tensor  product  functors.  Section  4  regards  Hom^(M,  N ), 
where  M  and  N  are  unital  left  R  modules,  as  a  contravariant  functor  of  the  M  variable  and  as  a 
covariant  functor  of  the  N  variable.  The  section  examines  the  interaction  of  these  functors  with 
the  direct  sum  and  direct  product  functors,  the  relationship  between  Horn  and  matrices,  the  role 
of  bimodules,  and  the  use  of  Horn  to  change  the  underlying  ring.  Section  5  introduces  the  tensor 
product  M  ®  r  N  of  a  unital  right  R  module  M  and  a  unital  left  R  module  N,  regarding  tensor 
product  as  a  covariant  functor  of  either  variable.  The  section  examines  the  effect  of  interchanging 
M  and  N ,  the  interaction  of  tensor  product  with  direct  sum,  an  associativity  formula  for  triple  tensor 
products,  an  associativity  formula  involving  a  mixture  of  Horn  and  tensor  product,  and  the  use  of 
tensor  product  to  change  the  underlying  ring.  Section  6  introduces  the  notions  of  a  complex  and  an 
exact  sequence  in  the  category  of  all  unital  left  R  modules  and  in  the  category  of  all  unital  right  R 
modules.  It  shows  the  extent  to  which  the  Horn  and  tensor  product  functors  respect  exactness  for 
part  of  a  short  exact  sequence,  and  it  gives  examples  of  how  Horn  and  tensor  product  may  fail  to 
respect  exactness  completely. 


1.  Simple  and  Semisimple  Modules 

This  chapter  develops  further  theory  for  unital  modules  over  a  ring  with  identity 
beyond  what  is  in  Section  VIII.  1 .  Results  about  modules  that  take  advantage  of 
commutativity  of  the  ring  were  included  in  Chapter  VIII.  In  the  present  chapter 
the  ring  may  or  may  not  be  commutative.  We  shall  be  interested  in  those  modules 
whose  structure  is  especially  easy  to  analyze  and  in  constructions  that  create  new 
modules  from  old  ones.  The  chapter  consists  of  tools  for  working  with  such 
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modules  and  their  related  rings  and  algebras.  There  are  no  major  theorems  in  the 
chapter,  but  the  material  here  is  essential  for  the  developments  in  several  of  the 
chapters  of  Advanced  Algebra. 

Throughout  this  chapter,  R  will  denote  a  ring  with  identity.  We  shall  work 
with  the  category  C  of  all  unital  left  R  modules.  Specifically  the  objects  of 
C  are  left  unital  R  modules,  and  the  space  of  morphisms  between  two  such 
modules  M  and  N  consists  of  all  R  homomorphisms  from  M  into  N.  It  is 
customary  to  write  Horn r(M,  N )  for  this  set  of  morphisms.1  In  the  special  case 
that  R  is  a  field,  the  notation  Horn  A  ( M,  N)  reduces  to  notation  we  introduced  in 
Section  II. 3  for  the  set  of  linear  maps  from  one  vector  space  over  R  to  another. 
For  general  R ,  the  set  HomAT/W,  N)  is  an  abelian  group  under  addition  of  the 
values:  (<p\  +  <p2)(m)  =  <P\ (m)  +  Without  some  further  hypothesis  on  R , 

I  IomAt/W.  N)  does  not  have  a  natural  R  module  structure. 

However,  there  is  some  residual  action  by  scalars.  Any  element  z  in  the  center 
Z  of  R,  i.e.,  any  element  with  cr  =  rc  for  all  r  in  R,  acts  on  Horn  A  ( M,  N).  The 
definition  is  that  =  (p(cm).  The  function  c<p  certainly  respects  addition, 

and itrespects action byascalarr  in  because  (c<p)(rw)  =  (p(cnn)  =  <p(rcm )  = 
r(p(cm )  =  r(c<p)(m)',  thus  c(p  is  in  HomA(M,  N ),  and  HomA(M,  N )  becomes  a 
Z  module.  The  center  Z  automatically  contains  the  multiplicative  identity  1  and 
its  integer  multiples  Z  1 . 

We  shall  tend  to  ignore  this  action  by  the  center  except  in  two  special  cases. 
One  is  that  R  is  commutative,  and  then  HomA(M,  N)  is  an  R  module.  The  other 
is  that  R  is  an  associative  algebra  (with  identity)  over  a  field  F .  In  this  case  the 
action  of  members  of  F  on  the  identity  of  R  embeds  F  into  R ,  and  F  may  thus 
be  identified  with  a  subfield  of  the  center  of  R.  The  result  is  that  when  R  is  an 
associative  algebra  over  a  field  F,  then  HomA>(/Vf.  N )  is  a  vector  space  over  F . 

We  write  EndA(M)  for  HomA(M,  M ).  This  abelian  group  has  the  structure 
of  a  ring  with  identity,  multiplication  being  composition:  {(p\j/){m)  =  q)(\j/Un)). 
The  distributive  laws  need  to  be  checked:  the  formula  (<p\  +  (pili?  =  <Pi 
is  immediate  from  the  calculation 

(Opt  +<P2)V0(w)  =  (<Pi 

=  <piW(m))  +  (p2(f(m))  =  (<pif  +  <p2f)(m), 

while  the  formula  <p(x/si  +  V'h)  =  <P  Vh  +  <P  V^2  makes  use  of  the  fact  that  <p  respects 
addition  and  is  proved  by  the  calculation 

+  ir2))(m)  =  +  faim)) 

=  (pifxim))  +  <p(ir2(m))  =  (<p  fi  +  < -pf2){m). 

'The  notation  Hom(M,  N)  with  no  subscript  is  sometimes  used  for  Homz(M,  N),  i.e.,  to  denote 
the  group  of  homomorphisms  from  one  abelian  group  to  another. 
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If  Z  is  the  center  of  R ,  then  End r(M)  is  a  Z  module,  as  well  as  a  ring,  and 
the  two  structures  are  compatible;  the  result  is  that  End r(M)  is  an  associative  Z 
algebra  in  the  sense  of  Example  15  in  Section  VIII.  1.  In  particular,  when  R  is  an 
associative  algebra  over  a  field  F,  then  End/?(M )  is  an  associative  F  algebra. 

There  is  usually  no  need  to  re-prove  for  right  R  modules  an  analog  of  each 
result  about  left  R  modules.  The  reason  is  that  we  can  make  use  of  the  opposite 
ring  R°  of  R,  defined  to  be  the  same  underlying  abelian  group  but  with  reversed 
multiplication:  a  o  b  =  ba.  Any  left  R  module  M  then  becomes  a  right  R° 
module  M°  under  the  definition  mr°  =  mi  for  r  in  R,  m  in  M,  and  r°  equal  to 
the  same  set-theoretic  member  of  R°  as  r.  The  theory  of  unital  left  R  modules 
for  all  R  thereby  yields  a  theory  of  unital  right  R  modules  for  all  R. 

A  unital  left  R  module  M  is  said  to  be  simple,  or  irreducible,  if  M  /  0  and  if 
M  has  no  proper  nonzero  R  submodules.  If  M  is  simple,  then  M  =  Rx  for  each 
x  ^  0  in  M\  conversely  if  M  ^  0  has  M  =  Rx  for  each  x  ^  0  in  M,  then  M  is 
simple.  Whenever  M  =  Rx  for  an  element  x,  then  M  is  isomorphic  as  a  unital 
left  R  module  to  R/F  where  /  is  the  left  ideal  I  =  {r  e  R  \  rx  =  0}. 

A  unital  left  R  module  M  is  said  to  be  semisimple  if  M  is  generated  by  simple 
left  R  submodules,  i.e.,  if  it  is  the  sum  of  simple  left  R  submodules.  In  this 
definition,  the  sum  may  be  empty  (and  then  M  =  0),  it  may  be  finite,  or  it  may 
be  infinite.  Evidently  simple  implies  semisimple  for  unital  left  R  modules. 

We  come  to  examples  in  a  moment.  First  we  prove  that  the  sum  of  simple  left 
R  modules  in  a  semisimple  module  may  always  be  taken  to  be  a  direct  sum,  i.e., 
that  semisimple  modules  are  completely  reducible. 

Proposition  10.1.  If  the  unital  left  R  module  M  is  semisimple,  then  M 
is  the  direct  sum  of  some  family  of  simple  R  submodules.  In  more  detail  if 
{Ms  |  .s'  e  .S’)  is  a  family  of  simple  R  submodules  of  the  unital  left  R  module  M 
whose  sum  is  M,  then  there  is  a  subset  T  of  S  with  the  property  that 

M  =  ®Mf. 

teT 

PROOF.  Call  a  subset  U  of  S  “independent”  if  the  sum  Mu  is  direct. 

This  condition  means  that  for  every  finite  subset  {u 1 . it,,}  of  U  and  every 

set  of  elements  m,-  e  MUj,  the  equation  nt  \  +  •  •  •  +  mn  =  0  implies  that  each 
nij  is  0.  From  this  formulation  it  follows  that  the  union  of  any  increasing  chain 
of  independent  subsets  of  S  is  itself  independent.  By  Zorn’s  Lemma  there  is  a 
maximal  independent  subset  T  of  S.  By  definition  the  sum  Mq  =  ^C  ,gy.  M,  is 
direct.  Consequently  it  suffices  to  show  that  Mq  is  all  of  M.  By  the  hypothesis 
on  S,  it  is  enough  to  show  that  each  Ms  is  contained  in  Mq.  For  s  in  T,  this 
conclusion  is  clear.  Thus  suppose  s  is  not  in  T .  By  the  maximality  of  T,T  U  j,v) 
is  not  independent.  Consequently  the  sum  Ms  +  Mq  is  not  direct,  and  it  follows 
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that  Ms  fl  Mq  /  0.  But  this  intersection  is  an  R  submodule  of  Ms.  Since  Ms  is 
simple,  a  nonzero  R  submodule  of  Ms  must  be  all  of  Ms .  Thus  Ms  fl  Mq  =  Ms , 
and  Ms  is  contained  in  Mq.  □ 

Examples  of  semisimple  modules. 

(1)  Let  F  be  a  field.  Left  and  right  amount  to  the  same  thing  for  modules  when 
the  underlying  ring  is  commutative.  We  know  that  the  unital  F  modules  are  just 
the  vector  spaces  over  F.  Such  a  vector  space  V  is  a  simple  F  module  if  and 
only  if  it  is  1 -dimensional,  since  1 -dimensionality  is  the  necessary  and  sufficient 
condition  to  have  V  /  0  be  of  the  form  V  =  Fx  for  all  x  /  0  in  V.  Any  vector 
space  V  is  the  sum  of  all  of  its  1 -dimensional  subspaces,  and  consequently  every 
unital  F  module  is  semisimple.  Theorem  2.42  shows  that  each  vector  space  V 
has  a  basis;  this  theorem  is  therefore  a  special  case  of  Proposition  10.1,  which 
says  that  any  semisimple  module  is  the  direct  sum  of  simple  modules. 

(2)  Let  D  be  a  division  ring.  Division  rings  were  defined  in  Section  IV.4  as 
rings  with  identity  1/0  such  that  the  nonzero  elements  form  a  group  under 
multiplication.  Every  field  is  a  division  ring,  and  the  quaternions  form  a  division 
ring  that  is  not  a  field.  Let  M  be  a  unital  left  D  module,  and  let  x  /  0  be  in 
M.  Then  the  left  D  module  Dx  is  simple  because  if  N  C  Dx  is  a  nonzero  D 
submodule  and  if  y  is  in  N,  then  we  can  write  y  =  dx  with  d  in  D  and  see  from 
the  formula  d~l y  =  x  that  x  is  in  TV  and  N  =  Dx.  Any  unital  left  D  module  is 
the  sum  of  its  D  submodules  Dx  for  x  in  M,  and  therefore  every  unital  left  D 
module  is  semisimple.  From  Proposition  10.1  we  can  conclude  that  every  unital 
left  D  module  M  is  the  direct  sum  of  simple  modules.  In  other  words,  M  has  a 
basis,  just  as  if  D  were  a  field.  Consequently  it  is  customary  to  refer  to  unital  left 
D  modules  as  left  vector  spaces  over  D.  A  notion  of  (left)  dimension,  equal  to 
a  well-defined  nonnegative  integer  or  oo,  will  emerge  from  the  discussion  in  the 
next  section. 

(3)  Let  D  be  a  division  ring.  Section  V.2  introduced  the  ring  of  n-by-n  matrices 
over  any  commutative  ring  with  identity,  and  Example  4  of  rings  in  Section  VIII.  1 
extended  the  definition  to  the  case  that  the  ring  is  noncommutative.  Thus  let  R 
be  the  ring  Mn(D).  Let  M  =  Dn  be  the  abelian  group  of  n-component  column 
vectors  with  entries  in  D.  Under  multiplication  of  matrices  times  column  vectors, 
M  becomes  a  unital  left  R  module.  Let  us  prove  that  M  is  simple.  It  is  enough  to 
show  that  Rm  =  M  for  every  nonzero  m  in  M.  Let  nr  be  in  M  with  entries  m\, 
and  suppose  that  the  /'*  component  m,0  of  m  is  /  0.  Then  we  can  multiply  on 
the  left  of  m  by  the  matrix  r  whose  (i,  y')th  entry  r,,-  is  m-mjj1  if  (i,  j)  =  (i,  jo) 
and  is  0  otherwise,  and  the  product  is  the  column  vector  in'.  Thus  nr  is  in  Rm, 
and  Rm  =  M  as  required.  Hence  M  =  Dn  is  an  example  of  a  simple  R  module. 

(4)  Again  let  D  be  a  division  ring,  and  let  R  =  Mn  ( D  ).  Let  us  see  that  the  left 
R  module  R  is  semisimple.  In  fact,  if  Rj  is  the  additive  subgroup  of  R  whose 
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nonzero  entries  are  all  in  the  jth  column,  then  R,  is  a  left  R  submodule  of  R  that 
is  R  isomorphic  to  Dn.  Thus  we  see  that  R  =  R\  ©  •  •  •  ©  Rn  as  left  R  modules, 
and  the  left  R  module  R  is  semisimple  as  a  consequence  of  Example  3. 

(5)  Let  G  be  a  group,  and  let  C G  be  the  complex  group  algebra  defined 
in  Example  16  in  Section  VIII.  1.  Let  V  be  a  vector  space  over  C,  and  let 
3>  :  G  — »■  GL{V )  be  a  representation  of  G  on  V .  The  universal  mapping 
property  of  complex  group  algebras  described  in  that  example  and  pictured  in 
Figure  8.4  shows  that  the  representation  of  G  extends  to  C G  and  makes  V 
into  a  unital  left  C G  module.  Conversely  if  the  complex  vector  space  V  is  a 
unital  left  CG  module,  then  we  obtain  a  representation  of  G  by  restriction  from 
CG  to  G.  What  needs  to  be  checked  here  is  that  each  member  of  G  acts  by  an 
invertible  linear  mapping.  This  is  a  consequence  of  the  unital  property;  since 
1  acts  as  1,  the  action  by  g~ 1  inverts  the  action  of  g.  Thus  we  have  a  one-one 
correspondence  of  representations  of  G  on  complex  vector  spaces  with  unital  left 
CG  module  structures.  Under  this  correspondence,  irreducible  representations 
of  G  (i.e.,  nonzero  representations  having  no  proper  nonzero  invariant  subspace) 
correspond  to  simple  CG  modules.  Now  suppose  that  G  is  finite.  Readers 
who  have  looked  at  Section  VII. 4  know  from  Corollary  7.21  that  every  finite¬ 
dimensional  representation  of  a  finite  group  G  on  a  complex  vector  space  is  the 
direct  sum  of  irreducible  representations;  the  corresponding  CG  modules  are 
therefore  semisimple.  But  more  is  true.  If  V  is  any  CG  module  for  the  finite 
group  G  and  if  x  is  in  V,  then  C Gx  is  a  vector  subspace  spanned  by  j gx  \  g  e  G} 
and  consequently  is  finite-dimensional.  Applying  what  is  known  from  Section 
VII. 4,  we  can  write  CGx  as  the  direct  sum  of  simple  CG  modules.  Therefore 
the  sum  of  all  simple  CG  modules  in  V  is  all  of  V,  and  V  is  semisimple.  From 
Proposition  10.1  we  conclude  that  every  unital  left  CG  module  is  semisimple  if 
G  is  a  finite  group. 

The  next  proposition  shows  that  decompositions  of  semisimple  modules  as 
direct  sums  of  simple  modules  behave  in  a  fashion  analogous  to  decompositions 
of  vector  spaces  as  direct  sums  of  1 -dimensional  vector  subspaces.  However, 
the  simple  modules  need  not  all  be  isomorphic  to  one  another,  as  is  shown  by 
Example  5.  A  theory  that  takes  the  isomorphism  types  of  simple  modules  into 
account  appears  in  Problems  12-20  at  the  end  of  the  chapter. 

Proposition  10.2.  Let  M  be  a  semisimple  left  R  module,  and  suppose  that 
M  =  ®seS-  Ms  is  the  direct  sum  of  simple  R  modules  Ms.  Let  N  be  any  R 
submodule  of  M.  Then 

(a)  the  quotient  module  M/N  is  semisimple.  In  more  detail  there  is  a  subset 
T  of  S  with  the  property  that  the  submodule  Mj  =  ®,g7  M,  of  M  maps 
R  isomorphically  onto  M/N . 
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(b)  N  is  a  direct  summand  of  M.  In  more  detail,  M  =  N  ©  Mr,  where  Mr 
is  as  in  (a). 

(c)  N  is  semisimple.  In  more  detail  choose  T  as  in  (a),  and  write  T'  for  the 
complement  of  T  ini'.  Then  the  quotient  mapping  M  — >  M/Mr  restricts 
to  an  R  isomorphism  of  N  onto  M/Mr,  and  M/ Mr  is  R  isomorphic  to 
Mr' . 


PROOF.  Each  simple  R  submodule  Ms  of  M  maps  to  an  R  submodule  Ms  of 
M/N.  This  image  either  is  simple  (and  then  is  R  isomorphic  to  Ms)  or  is  zero. 
We  let  U  be  the  subset  of  S  for  which  it  is  simple.  Then  M/N  is  evidently  the 
sum  of  the  simple  R  submodules  { Ms  \  s  e  U).  By  Proposition  10.1  there  is  a 
subset  T  of  JJ  such  that 

M/N  =  Q)Mt. 

teT 

This  proves  (a). 

For  (b),  we  use  the  following  elementary  observation:  if  N  and  N'  are  R 
submodules  of  M,  then  M  =  N  ©  N'  if  and  only  if  the  quotient  map  M  —>  M/N 
carries  N'  isomorphically  onto  the  quotient  M/N.  Taking  N'  =  MT  and  applying 
(a),  we  obtain  (b). 

For  (c),  the  same  observation  when  applied  first  to  M  =  N  ©  Mr  and  then  to 
M  =  Mr'  ©  Mr  shows  that  the  quotient  map  M  — »■  M/Mr  carries  N  isomor¬ 
phically  onto  M / MT  and  carries  Mr-  isomorphically  onto  M/M T.  Therefore 
N  =  M/Mt  =  Mr',  and  (c)  is  proved.  □ 

In  the  context  of  simple  modules,  Homfi(M,  N)  has  special  properties.  Read¬ 
ers  who  have  looked  at  Section  VII. 4  have  seen  these  special  properties  in  the 
context  of  representations  of  finite  groups  on  complex  vector  spaces.  There  they 
were  captured  by  Schur’s  Femma  (Proposition  7.18).  If  we  pass  from  represen¬ 
tations  on  complex  vector  spaces  to  C G  modules,  following  the  prescription  in 
Example  5,  we  obtain  a  result  about  HomcG(AL  N)  when  G  is  a  finite  group. 
Femma  10.3  and  Proposition  10.4  generalize  this  to  a  result  about  Hom/?(M,  N) 
for  arbitrary  R. 

Lemma  10.3.  Suppose  that  £  is  a  simple  left  R  module  and  that  M  = 
®«SA  Ma  is  a  direct-sum  decomposition  of  the  unital  left  R  module  M  into 
arbitrary  R  submodules,  not  necessarily  simple.  Then 

Hom«(£,  M)  =  ®  H0m/?(£,  Ma) 

aeA 


as  an  isomorphism  of  abelian  groups. 
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Remarks.  The  hypothesis  that  E  is  simple  is  critical  here.  Without  it  a 
map  into  a  direct  sum  might  have  nonzero  projections  into  infinitely  many  of  the 
summands,  and  then  it  could  not  be  represented  as  a  finite  sum  of  maps  into  sum¬ 
mands.  Proposition  10.12  below  will  point  out  that  the  correct  identity  without  a 
special  hypothesis  on  E  is  Hom^fE,  Wa^A  Ma)  =  ELsa  Hom^fT,  Ma). 

PROOF.  Suppose  <p  is  in  Hom^fE,  M).  Write  <pa  for  the  composition  of  <p  with 
the  projection  M  — »■  Ma .  The  map  from  left  to  right  in  the  displayed  isomorphism 
is  to  be  (p  {(pn},i(i/\.  Suppose  for  the  moment  that  the  image  is  contained  in  the 
direct  sum  on  the  right.  The  mapping  is  one-one  since  M  is  the  sum  of  the  Ma ’s, 
and  it  is  onto  since  the  mapping  is  the  identity  on  each  subgroup  Hom^fE,  Ma) 
of  Horn p(E ,  M). 

Thus  we  must  show  for  each  <p  that  only  finitely  many  of  the  maps  < pa  are 
nonzero.  Choose  e  in  E  with  (pie)  ^  0,  and  write 

cp{e)  =  m  i  +  •  •  •  +  mn  with  m,-  e  Ma. . 

Since  E  is  simple,  E  =  Re.  Therefore 

(p(E)  =  R(p(e )  =  R(m\  +  •  •  •  +  mn)  C  Rm\  +  •  •  •  +  Rmn 

c  Mfll  ffi  ■  ■  ■  ffi  Man. 

Consequently  only  <pai, ,  (pa„  can  be  nonzero.  □ 

Lemma  10.3  enables  us  to  study  maps  between  semisimple  modules  in  terms 
of  maps  between  simple  modules.  The  latter  are  described  by  the  next  result. 

Proposition  10.4  (Schur’s  Lemma).  Suppose  that  M  and  N  are  simple  left  R 
modules. 

(a)  If  M  and  N  are  not  R  isomorphic,  then  Hom^(M,  N)  =  0. 

(b)  End« (M)  is  a  division  ring. 

(c)  (Dixmier)  If  R  is  an  associative  algebra  over  an  algebraically  closed  field 
F  and  if  the  vector-space  dimension  of  M  over  F  is  less  than  the  cardinality  of 
F,  then  Endfl(M)  consists  of  the  F  multiples  of  the  identity. 

Remark.  In  the  setting  of  representations  of  a  finite  group  G  as  in  Section 
VII. 4,  or  in  the  case  that  G  is  a  finite  group  and  R  =  CG  in  the  current  setting,  any 
singly  generated  R  module  such  as  M  or  N  is  finite-dimensional  over  C.  Part  (a) 
in  that  case  reduces  to  the  statement  that  the  vector  space  of  intertwining  operators 
between  two  inequivalent  irreducible  representations  is  0.  Part  (c)  in  that  case 
says  that  the  space  of  self-intertwining  operators  for  an  irreducible  representation 
consists  of  the  scalar  multiples  of  the  identity.  For  a  general  R.  we  get  only  the 
weaker  conclusion  of  (b)  that  End^fM)  is  a  division  ring.  If  R  is  an  associative 
algebra  over  a  field  F,  we  have  seen  that  End p  ( M )  is  an  associative  algebra  over 
F,  and  (c)  gives  a  condition  under  which  we  can  improve  upon  (b). 
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PROOF.  Suppose  that  (p  is  nonzero  in  Horns  (M,  N).  Then  ker<p  is  a  proper 
R  submodule  of  M,  and  we  must  have  ker  <p  =  0  since  M  is  simple.  Similarly 
image  <p  is  a  nonzero  R  submodule  of  /V,  and  we  must  have  image  cp  =  F  since 
N  is  simple.  Therefore  (p  is  an  R  isomorphism  of  M  onto  N .  This  proves  (a)  and 
(b). 

For  (c)  let  m  be  a  nonzero  element  of  M.  The  map  <p  <p(m )  is  F  linear  and 
one-one  from  End R(M)  into  M  by  (b).  Thus  Ends(M)  as  an  associative  division 
algebra  over  F  has  vector-space  dimension  at  most  the  vector-space  dimension  of 
M,  and  the  latter  by  hypothesis  is  strictly  less  than  the  cardinality  of  F .  Arguing 
by  contradiction,  let  us  assume  that  Endfl(M)  is  not  equal  to  F\  say  EndR(M) 
contains  an  element  cp  not  in  F. 

The  smallest  division  subalgebra  of  End R  (M)  containing  F  and  <p  is  the  field 
F  generated  by  F  and  cp.  Since  F  is  algebraically  closed,  <p  is  not  a  root  of  any 
nonzero  polynomial  with  coefficients  in  F.  Thus  the  substitution  homomorphism 
equal  to  the  identity  on  F  and  carrying  X  to  (p  is  one-one  from  F\X\  into  F . 
By  the  universal  mapping  property  of  fields  of  fractions  (Proposition  8.6),  the 
substitution  homomorphism  factors  through  the  field  of  fractions  F{X).  Thus 
we  may  regard  F{X )  as  a  subfield  of  F .  In  the  field  F(X),  the  set  of  elements 
{(X  —  c)-1  [  c  e  F\  is  linearly  independent  over  F,  as  we  see  by  assuming  a 
nontrivial  linear  dependence  and  clearing  fractions,  and  hence  dim/.  F(X)  is  > 
the  cardinality  of  F.  Since  Ends(M)  2  f  2  F(X)  under  our  identification,  the 
dimension  of  Ends(M)  over  F  is  >  the  cardinality  of  F.  This  conclusion  contra¬ 
dicts  the  observation  of  the  previous  paragraph  that  the  dimension  of  End R  ( M )  is 
strictly  less  than  the  cardinality  of  F.  So  the  assumption  that  End R(M)  contains 
an  element  not  in  F  must  be  false,  and  (c)  follows.  □ 


2.  Composition  Series 

We  continue  with  R  as  a  ring  with  identity,  and  we  work  with  the  category  of 
all  unital  left  R  modules.  In  this  section  we  shall  say  what  is  meant  by  a  unital 
left  R  module  of  "finite  length,”  and  we  shall  investigate  semisimplicity  for  such 
modules. 

A  finite  filtration  of  a  unital  left  R  module  M  is  a  finite  descending  chain 


M  =  M0  2  Mi  2  •  •  •  2  Mn  =  0 


of  R  submodules.  We  do  not  insist  on  this  particular  indexing,  and  with  the 
obvious  adjustments,  we  allow  also  a  finite  increasing  chain  to  be  called  a  fi¬ 
nite  filtration.  Relative  to  the  displayed  inclusions,  the  modules  for 

0  <  i  <  n  —  1  are  called  the  consecutive  quotients  of  the  filtration.  The  finite 
filtration  is  called  a  composition  series  if  the  consecutive  quotients  are  all  simple 
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R  modules;  in  particular,  they  are  to  be  nonzero.  The  consecutive  quotients  in 
this  case  are  called  composition  factors. 

We  encountered  an  analogous  notion  with  groups  in  Section  IV.8,  but  there 
was  a  complication  in  that  case.  The  complication  was  that  each  subgroup  had 
to  be  normal  in  the  next-larger  subgroup  in  order  for  the  consecutive  quotients  to 
be  groups.  The  overlap  between  the  current  treatment  and  the  earlier  treatment 
occurs  for  abelian  groups,  which  on  the  one  hand  are  unital  Z  modules  and  on 
the  other  hand  are  groups  whose  subgroups  are  automatically  normal. 

We  are  going  to  obtain  analogs  for  the  category  of  unital  left  R  modules  of  the 
group-theoretic  results  of  Zassenhaus,  Schreier,  and  Jordan-Holder  in  Section 
IV.8.  The  ones  here  will  be  a  little  easier  to  prove  than  those  in  Section  IV.8  since 
we  do  not  have  the  complication  of  checking  whether  subgroups  are  normal.  Let 

M  =  M0  2  Mx  2  •  •  •  2  Mm  =  0 
and  M  =  N0  2  N\  2  •  •  •  2  Nn  =  0 

be  two  finite  filtrations  of  M.  We  say  that  the  second  is  a  refinement  of  the  first 
if  there  is  a  one-one  function  /  :  {0, . . . ,  m]  — »■  {0, . . . ,  n]  with  Mj  =  N/q)  for 
()</'<  m.  The  two  finite  filtrations  of  M  are  said  to  be  equivalent  if  m  =  n  and 
if  the  order  of  the  consecutive  quotients  Mo/ M\,  M1/M2,  . . .  ,  Mm_i/M,„  may 
be  rearranged  so  that  they  are  respectively  isomorphic  to  N$/N\,  N\ / N 2 ,  ...  , 
Nm-ll Nm  ■ 

Lemma  10.5  (Zassenhaus).  Let  M\,  Mi ,  Mj,  and  Mj  be  R  submodules  of  a 
unital  left  R  module  M  with  Mj  C  M\  and  M'2  C  M2 .  Then 

((Mi  n  Mi)  +  Mj)/((M!  n  M')  +  Mj) 

=  ((M,  n  M2)  +  M')/((Mj  n  Ml)  +  Mj). 
PROOF.  By  the  Second  Isomorphism  Theorem  (Theorem  8.4), 

(Mi  n  m2)/(((M!  n  Mj)  +  Mj)  n  (M,  n  m2)) 

=  {(Mx  n  M2)  +  (Mx  n  Mj)  +  Mj)/((Mi  n  Mj)  +  Mj) 

=  ((Ml  n  Ml)  +  M[)/((Mi  n  Mj)  +  Mj). 

Since  we  have 

((Mi  n  Mj)  +  Mj)  n  (Mi  n  m2)  =  ((Mi  n  Mj)  +  Mj)  n  m2 

=  (Mi  n  Mj)  +  (Mj  n  m2), 

we  can  rewrite  the  above  isomorphism  as 
(Ml  n  Ml) /((Ml  n  Mj)  +  (Mj  n  m2)) 

=  ((Ml  n  M2)  +  Mj)/((Mi  n  Mj)  +  Mj  ). 

The  left  side  of  this  isomorphism  is  symmetric  under  interchange  of  the  indices  1 
and  2.  Hence  so  is  the  right  side,  and  the  lemma  follows.  □ 
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Theorem  10.6  (Schreier).  Any  two  finite  filtrations  of  a  module  M  in  C  have 
equivalent  refinements. 

Proof.  Let  the  two  finite  filtrations  be 


M  =  M0  2  Mi  2  •  •  •  2  Mm  =  0 
and  M  =  N0  2  Ni  2  •  •  •  2  Nn  =  0, 

and  define 

Mjj  =  (Mi  D  Nj)  +  Mj+ 1  for  0  <  i  <  m  —  1  and  0  <  j  <  n, 

Njj  =  (Mj  n  Nj)  +  Nj. |-i  for  0  <  i  <  m  and  0  <  j  <  n  —  1. 

Then 

M  =  M0o  2  M0i  2  •  •  •  2  M0„ 

2  Mio  2  Mn  2  •  •  •  2  Mu  2  •  •  •  2  Mm-hn  =  0 

and  M  =  Nqo  2  A^oi  2  •  •  •  2  Nom 

2  Nl0  2  Mi  2  •  •  •  2  Ab,„  2  •  •  •  2  Nn-Um  =  0 

are  refinements  of  the  respective  given  filtrations.  The  containments  Mln  2 

M;+ i,o  and  iVym  2  N/+ i,o  are  equalities  here,  and  the  only  nonzero  consecutive 

quotients  are  therefore  of  the  form  M^/M,-,  /+i  and  /V/(  /  Njj+i  .  For  these  we  have 

Mij/Mij+i 

=  ((M,  FI  Nj)  +  Mi+i)/((Mj  FI  Nj+i  +  Mi+ 1)  by  definition 

=  ((Mi  n  Nj)  +  Nj+i)/((Mi+i  n  Nj)  +  Nj+i)  by  Lemma  10.5 

=  Nji/Njj+ 1  by  definition, 

and  thus  the  above  refinements  are  equivalent.  □ 

Corollary  10.7  (Jordan-Holder  Theorem).  If  M  is  a  unital  left  R  module  with 
a  composition  series,  then 

(a)  any  finite  filtration  of  M  in  which  all  consecutive  quotients  are  nonzero 
can  be  refined  to  a  composition  series,  and 

(b)  any  two  composition  series  of  M  are  equivalent. 

PROOF.  We  apply  Theorem  10.6  to  a  given  filtration  and  a  known  composition 
series.  After  discarding  redundant  terms  from  each  refinement  (those  that  lead  to 
0  as  a  consecutive  quotient),  we  arrive  at  a  refinement  of  our  given  finite  filtration 
that  is  equivalent  to  the  known  composition  series.  Hence  the  refinement  is  a 
composition  series.  This  proves  (a).  If  we  specialize  this  argument  to  the  case 
that  the  given  filtration  is  a  composition  series,  then  we  obtain  (b).  □ 
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Corollary  10.7  implies  that  the  composition  factors  for  a  given  composition 
series  depend  only  on  M ,  not  on  the  particular  composition  series.  Moreover, 
if  M'  D  M"  are  R  submodules  of  an  M  with  a  composition  series  such  that 
M'/M"  is  simple,  then  M'/M"  is  a  composition  factor  of  M.  This  fact  follows 
by  eliminating  redundant  terms  from  the  finite  filtration  M  M'  M"  3  0  and 
applying  Corollary  10.7a  to  the  result. 

If  a  unital  left  R  module  M  has  a  composition  series,  then  we  say  that  M  has 
finite  length.  This  notion  is  closed  under  passage  to  submodules  and  quotients. 
In  fact,  if 

M=M03M1D.-OM„=0 

is  a  composition  series  of  M  and  if  M'  is  an  R  submodule  of  M,  then 

M'  =  M0nM'2M1nM'3.-OM„nM'  =  0 

is  a  finite  filtration  of  M'  in  which  each  consecutive  quotient  is  simple  or  0. 
Discarding  redundant  terms  (which  lead  to  0  as  a  consecutive  quotient),  we  obtain 
a  composition  series  for  M' .  A  similar  argument  works  for  M/M'. 

Let  us  see  that  if  the  unital  left  R  modules  M'  and  M/M'  have  finite  length, 
then  so  does  M.  In  fact,  we  take  a  composition  series  for  M/M' ,  pull  it  back  to 
M,  and  concatenate  it  to  a  composition  series  for  M'.  The  result  is  a  composition 
series  for  M,  and  the  assertion  follows.  In  particular,  the  direct  sum  of  two  unital 
left  R  modules  of  finite  length  has  finite  length. 

If  M  has  a  composition  series  of  the  form  M  =  Mq  5  M\  D  •  •  •  D  Mn  =  0, 
then  we  say  that  M  has  length  n.  If  it  has  no  composition  series,  we  say  it  has 
infinite  length.  According  to  Corollary  10.7,  this  notion  of  length  is  independent 
of  the  particular  composition  series  that  we  use.  The  argument  in  the  previous 
paragraph  shows  that  if  M'  is  an  R  submodule  of  M,  then 

length  (M)  =  length(M')  +  length(M/A/'), 

with  the  finiteness  of  either  side  implying  the  finiteness  of  the  other  side.  One 
consequence  is  that  if  M'  is  a  length-/;  submodule  of  a  length-?;  module  M  with 
?7  finite,  then  M'  =  M.  Another  consequence  is  that  if  M  is  a  semisimple  left  R 
module,  then  M  has  a  composition  series  if  and  only  if  M  is  the  finite  direct  sum 
of  simple  left  R  modules. 

From  the  last  of  these  observations,  we  see  that  if  F  is  a  field,  then  the  vector 
spaces  over  F  that  have  a  composition  series  are  the  finite-dimensional  vector 
spaces,  and  in  this  case  the  length  of  the  vector  space  is  its  dimension.  The 
structure  of  finite-dimensional  vector  spaces  is  so  elementary  that  the  Jordan- 
Holder  Theorem  is  of  no  interest  in  this  case,  and  it  was  for  that  reason  that  no 
version  of  the  Jordan-Holder  Theorem  for  vector  spaces  appeared  earlier  in  the 
book. 
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In  the  case  that  R  =  D  is  a  division  ring,  matters  are  slightly  subtler.  We  know 
from  Example  2  in  Section  1  that  every  unital  left  D  module  is  semisimple,  and  we 
noted  that  such  D  modules  are  therefore  called  left  vector  spaces.  Corollary  10.7 
shows  that  the  number  of  summands  in  any  decomposition  of  a  left  vector  space 
V  as  the  direct  sum  of  simple  D  modules  is  either  an  integer  n  >  0  independent 
of  the  decomposition  or  is  infinite,  independently  of  the  decomposition.  This 
number,  the  integer  n  or  oo,  is  called  the  dimension  of  the  left  vector  space  V. 

We  saw  one  other  example  of  a  semisimple  left  R  module.  Specifically  if  D  is  a 
division  ring,  then  we  saw  in  Example  4  of  Section  1  that  R  =  Mn  ( D )  is  semisim¬ 
ple  as  a  left  R  module.  The  number  of  simple  summands  is  n,  and  hence  R  has 
length  n.  So  R  has  a  composition  series  when  considered  as  a  left  R  module. 

There  are  two  other  cases  in  which  composition  series  give  something  familiar. 
One  is  the  case  that  R  is  the  ring  Z  of  integers.  A  unital  Z  module  is  an  abelian 
group,  and  we  know  that  the  simple  abelian  groups  are  the  cyclic  groups  of 
prime  order.  For  an  abelian  group  with  a  composition  series,  the  order  of  the 
group  is  the  product  of  the  orders  of  the  consecutive  quotients  and  hence  is  finite. 
Consequently  an  abelian  group  has  a  composition  series  if  and  only  if  it  is  a  finite 
abelian  group.  Such  a  group  need  not  be  semisimple;  the  group  C4,  for  example, 
is  not  the  direct  sum  of  cyclic  groups  of  prime  order. 

The  other  case  concerns  triangular  form,  Jordan  canonical  form,  and  related 
decompositions,  as  explained  in  Sections  V.3  and  V.6  and  as  reinterpreted  after 
Corollary  8.29.  Let  V  be  a  finite-dimensional  vector  space  over  a  field  K,  and  let 
L  :  V  — »■  V  be  a  linear  mapping  from  V  to  itself.  Put  R  =  K[X],  and  make  V  into 
a  unital  R  module  by  the  definition  A(X)(v)  =  A{L)v  for  any  A(X)  in  K[X]  and 
v  in  V .  The  R  submodules  are  the  vector  subspaces  of  V  that  are  invariant  under 
L.  The  finite  dimensionality  of  V  forces  V  to  have  a  composition  series  as  an  R 
module.  Let  us  suppose  for  a  moment  that  K  is  algebraically  closed.  Proposition 
5.6  says  that  the  matrix  of  L  in  some  ordered  basis  is  upper  triangular,  and  linear 
combinations  of  the  first  k  vectors  in  this  basis  form  an  invariant  subspace  under 
L  of  dimension  k.  These  subspaces  are  nested,  and  thus  we  obtain  a  composition 
series.  Thus  obtaining  a  composition  series  when  K  is  algebraically  closed  is 
equivalent  to  obtaining  triangular  form.  The  existence  of  Jordan  form  is  a  finer 
result.  The  discussion  after  Corollary  8.29  shows  that  V  is  a  finite  direct  sum  of  R 
modules  R/(X  —  cj)k'  with  c,  in  K  and  kj  >  0.  For  each  of  these,  the  discussion 
at  the  end  of  Section  VIII. 6  shows  how  to  refine  R/(X  —  Cj)kj  to  a  composition 
series  for  which  there  is  an  R  submodule  of  each  possible  dimension  from  0  to 
kj :  the  finer  structure  is  hidden  in  the  way  that  each  invariant  subspace  is  obtained 
from  the  next  smaller  invariant  subspace.  If  K  is  not  necessarily  algebraically 
closed,  then  (X  —  cj)kj  is  to  be  replaced  by  Pj(X)kj  for  some  prime  polynomial 
Pj{X ),  and  the  consecutive  quotients  for  R/(Pj(X))kJ  have  dimension  equal  to 
the  degree  of  Pj(X). 
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3.  Chain  Conditions 

We  continue  with  R  as  a  ring  with  identity,  and  we  work  with  the  category  of 
all  unital  left  R  modules.  Except  in  special  cases  we  did  not  address  conditions 
in  Section  2  under  which  a  unital  left  R  module  M  has  a  composition  series.  In 
this  section  we  shall  see  that  a  necessary  and  sufficient  condition  for  M  to  have  a 
composition  series  is  that  it  satisfy  two  “chain  conditions,”  an  ascending  one  and 
a  descending  one,  that  we  shall  define.  We  already  encountered  the  ascending 
chain  condition  in  Proposition  8.30  for  the  special  case  that  R  is  a  commutative 
ring  with  identity,  and  the  proof  for  general  R  requires  only  cosmetic  changes. 

Proposition  10.8.  If  R  is  a  ring  with  identity  and  M  is  a  unital  left  R  module, 
then  the  following  conditions  on  R  submodules  of  M  are  equivalent: 

(a)  (ascending  chain  condition)  every  strictly  ascending  chain  of  R  sub- 
modules  M\  C  Ms  5  •  •  •  terminates  in  finitely  many  steps, 

(b)  (maximum  condition)  every  nonempty  collection  of  R  submodules  has 
a  maximal  element  under  inclusion, 

(c)  (finite  basis  condition)  every  R  submodule  is  finitely  generated. 

Proof.  To  see  that  (a)  implies  (b),  let  S  be  a  nonempty  collection  of  R 
submodules  of  M.  Take  M\  in  S.  If  M\  is  not  maximal,  choose  Mi  in  S  properly 
containing  M\.  If  M2  is  not  maximal,  choose  M3  in  S  properly  containing  Mi. 
Continue  in  this  way.  By  (a),  this  process  must  terminate,  and  then  we  have  found 
a  maximal  R  submodule  in  S. 

To  see  that  (b)  implies  (c),  let  N  be  an  R  submodule  of  M,  and  let  S  be 
the  collection  of  all  finitely  generated  R  submodules  of  N.  This  collection  is 
nonempty  since  0  is  in  it.  By  (b),  S  has  a  maximal  element,  say  N'.  If  x  is  in 
N  but  x  is  not  in  N',  then  /V'  +  Rx  is  a  finitely  generated  R  submodule  of  N 
that  properly  contains  N'  and  therefore  gives  a  contradiction.  We  conclude  that 
N'  =  N,  and  therefore  N  is  finitely  generated. 

To  see  that  (c)  implies  (a),  let  M\  C  Mi  S  •  •  •  be  given,  and  put  N  = 
U~=1  M„.  By  (c),  N  is  finitely  generated.  Since  the  M„  are  increasing  with  n, 
we  can  find  some  M„0  containing  all  the  generators.  Then  the  sequence  stops  no 
later  than  at  Mno.  □ 

The  corresponding  result  for  descending  chains  is  as  follows. 

Proposition  10.9.  If  R  is  a  ring  with  identity  and  M  is  a  unital  left  R  module, 
then  the  following  conditions  on  R  submodules  of  M  are  equivalent: 

(a)  (descending  chain  condition)  every  strictly  descending  chain  of  R 
submodules  M\  ^  Mi  2  •  •  •  terminates  in  finitely  many  steps, 

(b)  (minimum  condition)  every  nonempty  collection  of  R  submodules  has 
a  minimal  element  under  inclusion. 
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Proof.  To  see  that  (a)  implies  (b),  let  S  be  a  nonempty  collection  of  R 
submodules  of  M.  Take  M\  in  S.  If  M\  is  not  minimal,  choose  M2  in  S  properly 
contained  in  M\ .  If  M2  is  not  minimal,  choose  M3  in  S  properly  contained  in 
M2.  Continue  in  this  way.  By  (a),  this  process  must  terminate,  and  then  we  have 
found  a  minimal  R  submodule  in  S. 

To  see  that  (b)  implies  (a),  we  observe  that  the  members  of  any  strictly  de¬ 
scending  chain  would  be  a  family  without  a  minimal  element.  Since  (b)  says  that 
any  nonempty  family  has  a  minimal  element,  there  can  be  no  such  chain.  □ 

Proposition  10.10.  Let  R  be  a  ring  with  identity,  let  M  be  a  unital  left  R 
module,  and  let  N  be  an  R  submodule  of  M.  Then 

(a)  M  satisfies  the  ascending  chain  condition  if  and  only  if  N  and  M/N 
satisfy  the  ascending  chain  condition, 

(b)  M  satisfies  the  descending  chain  condition  if  and  only  if  N  and  M/N 
satisfy  the  descending  chain  condition. 

PROOF.  We  prove  (a),  and  the  proof  of  (b)  is  completely  similar.  Suppose  M 
satisfies  the  ascending  chain  condition  and  hence  also  the  maximum  condition 
by  Proposition  10.8.  The  R  submodules  of  N  are  in  particular  R  submodules 
of  M  and  hence  satisfy  the  maximum  condition.  The  R  submodules  of  M/N 
lift  back  to  R  submodules  of  M  containing  N,  and  they  too  must  satisfy  the 
maximum  condition.  By  Proposition  10.8,  N  and  M/N  satisfy  the  ascending 
chain  condition. 

Conversely  suppose  that  N  and  M/N  satisfy  the  ascending  chain  condition. 
Let  {M;}  be  an  ascending  chain  of  R  submodules  of  M;  we  are  to  show  that  (M/) 
is  constant  from  some  point  on.  Since  N  and  M/N  satisfy  the  ascending  chain 
condition,  we  can  find  an  n  such  that 

Mn+k  n  N  =  Mn  n  N  and  (. Mn+k  +  N)/N  =  (M„  +  N)/N 

for  all  k  >  0.  Combining  the  Second  Isomorphism  Theorem  (Theorem  8.4)  and 
the  first  of  these  identities  gives 

(M„+*  +  N)/N  =  Mn+k/(Mn+k  n  N)  =  Mn+k/(Mn  n  N) 

for  all  k  >  0.  Combining  this  result  and  two  applications  of  the  second  of  the 
identities  gives 

Mn+k/(Mn  n  N)  =  M„/(M„  n  IV). 

The  First  Isomorphism  Theorem  (Theorem  8.3)  shows  that 

{Mn+k/(Mn  n  A0)/(M„/(M„  n  A0)  =  Mn+k/Mn. 

Since  the  left  side  is  the  0  module,  the  right  side  is  the  0  module.  Therefore 
Mn+k  =  Mn  for  all  k  >  0.  □ 
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Proposition  10.11.  If  R  is  a  ring  with  identity  and  M  is  a  unital  left  R  module, 
then  M  has  a  composition  series  if  and  only  if  M  satisfies  both  the  ascending 
chain  condition  and  the  descending  chain  condition. 

PROOF.  If  M  has  a  composition  series  of  length  n,  then  the  Jordan-Holder 
Theorem  (Corollary  10.7a)  shows  that  every  finite  filtration  of  M  with  nonzero 
consecutive  quotients  has  length  <  n,  and  hence  M  satisfies  both  chain  conditions. 

Conversely  suppose  that  M  satisfies  both  chain  conditions.  By  the  maximum 
condition,  choose  if  possible  a  maximal  proper  R  submodule  N\  of  M ,  then  choose 
if  possible  a  maximal  proper  R  submodule  N7  of  N\ ,  and  so  on.  If  all  these  choices 
are  possible,  we  obtain  a  strictly  descending  chain  M  2  IVi  2  N2  2  •  •  • ,  and  the 
consecutive  quotients  will  be  simple  at  each  stage.  The  minimum  condition  says 
that  we  cannot  have  such  a  chain,  and  thus  the  choice  is  impossible  for  the  first 
time  at  some  stage  k.  That  means  that  some  N k  has  no  proper  R  submodule,  and 
Nk  must  be  0.  Then  M  =  N\  2  N2  2  •  •  •  2  Nk  =  0  is  a  composition  series.  □ 


4.  Horn  and  End  for  Modules 

We  continue  to  work  with  the  category  C  of  unital  left  R  modules,  where  R  is 
a  ring  with  identity,  not  necessarily  commutative.  Our  interest  in  this  section  is 
with  HomA> (/W.  N)  and  End^(M),  where  M  and  IV  are  modules  in  C.  Recall  from 
Section  1  that  Homfl(M,  N)  is  a  unital  Z  module,  where  Z  is  the  center  of  R. 
and  that  End r{M)  is  a  Z  algebra,  the  multiplication  being  composition.  We  shall 
tend  to  ignore  Z  except  when  R  is  commutative  or  R  is  an  associative  algebra 
over  a  field.  However,  Z  will  implicitly  play  a  role  in  the  context  of  bimodules, 
which  we  introduce  near  the  end  of  this  section. 

In  this  section  we  shall  be  interested  in  interactions  of  Horn r(M,  N )  and 
Endft(M)  within  the  category  C,  in  identities  that  they  satisfy,  in  the  naturality  of 
such  identities,  and  in  the  use  of  Hom^  (M,  N )  in  “change  of  rings,”  also  known 
as  “extension  of  scalars."  The  next  section  will  carry  out  a  similar  investigation 
for  a  notion  of  tensor  product  that  generalizes  the  tensor  products  in  Chapter  VI, 
and  we  shall  obtain  in  addition  one  important  formula  involving  Horn  and  tensor 
products  at  the  same  time.  Finally  in  Section  VI  we  shall  examine  the  effect  of 
Horn  and  tensor  product  on  “exact  sequences.” 

The  first  observation  is  that  Horn/?  is  a  functor,  either  a  functor  of  one  variable 
with  the  other  variable  held  fixed  or,  less  satisfactorily,  a  functor  of  two  variables. 
To  be  precise,  let  V  be  the  category  of  all  abelian  groups.  For  fixed  M  in  ObjfC ), 
we  define 


F(N )  =  Horn r(M,  N). 
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IfipisinHom^lIV,  N'),  we  define  F{(p)  in  Hom^  (Hom^(M,7V),  Hom/^(M,IV,)) 
by  the  formula 


F(<p)( r)  =  cp x  for  r  e  HomR(M,  IV), 

where  cpx  denotes  the  composition  of  r  followed  by  ip.  In  other  words,  F  (ip) 
is  given  by  pos /multiplication  by  cp.  By  inspection  we  see  that  F ( 1  ,y )  is  the 
identity  from  Hom^  (M,  N)  to  itself  if  1#  is  the  identity  on  N  and  that  F  (ip'  ip)  = 
F(ip')F{ip)  if  ip'  is  in  Hom^  (IV',  IV");  the  latter  formula  comes  down  to  the  asso¬ 
ciativity  formula  (ip'(p)x  =  ip'  {(p  r)  for  functions  under  composition.  Therefore/7 
is  a  covariant  functor  from  the  category  C  to  the  category  T>.  We  write  Hom(l ,  ip) 
for  F(<p),  so  that  Hom(l,  (p)(x)  =  (pr. 

Similarly  for  fixed  N  in  Obj(C ),  we  define 

G(M)  =  Horn r(M,  N). 

On  morphisms,  G  is  given  by  premultiplication.  Specifically  for  a  morphism  i// 
in  Horn r(M,  M'),  we  define  G(xfr)  in  Hom^  (Hom^M',  IV),  Hom/j(M,  IV))  by 
the  formula 

G(t/r)(r)  =  rx//  for  r  e  Homs( M' ,  N). 

We  readily  check  that  G  is  a  contravariant  functor  from  C  to  V.  We  write 
Hom(i/r,  1)  for  G(xfr),  so  that  Hom(f,  l)(r)  =  ri/f. 

To  create  a  single  functor  H  from  F  and  G,  we  can  try  to  define  a  functor 
H  from  C2  to  V  by  Ff(M,  N )  =  Horn R(M,  N).  If  ip  e  Hom^(IV,  N')  and 
ifr  e  HomK(A/,  M')  are  given,  we  can  try  the  formula  <p)( r)  =  iprxfr  as  a 

definition  for  r  in  Ho (/!/',  N).  The  trouble  is  that  H  is  mixed  as  contravariant 
in  the  first  variable  and  covariant  in  the  second  variable.  To  get  //  to  be  covariant, 
we  can  use  the  same  formulas  but  regard  H  as  defined  on  C  opp  x  C,  where  C  opp  is 
the  opposite  category  of  C,  as  defined  in  Problems  78-80  at  the  end  of  Chapter  IV. 
But  this  is  getting  to  be  a  complicated  structure  for  describing  something  simple, 
and  we  shall  simply  avoid  this  construction  altogether,2  working  with  F  or  G  as 
circumstances  dictate. 

Even  though  we  shall  not  work  with  H  as  a  functor,  it  is  convenient  to 
combine  Hom(l,  <p)  and  Hom(i//,  1)  into  a  single  definition  of  Hom(i//,  ip)  as 
Hom(i/r,  ip)(t)  =  ipT\j/ .  In  particular,  Hom(l,  ip)  and  Hom(i//,  1)  commute  with 
each  other;  the  commutativity  follows  from  the  associative  law 

Hom(f,  1)  o  Hom(l,  <p)(  r)  =  ((pr)xjr  =  <p(Txjf)  =  Hom(l,  < p)  o  Hom(i/f,  l)(r). 

2In  category  theory  one  sometimes  proceeds  in  another  way,  defining  a  “bifunctor”  to  be  a 
functor-like  thing  depending  on  two  variables,  covariant  or  contravariant  in  each  but  maybe  not  the 
same  in  each,  and  satisfying  an  appropriate  commutativity  property  for  the  two  variables. 
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Now  let  us  turn  to  three  identities  involving  Horn/?  and  to  their  ramifications. 
Each  identity  will  assert  some  isomorphism  involving  Horn,  and  we  consider  each 
side  of  the  identity  as  the  value  of  a  functor.  We  shall  be  interested  in  knowing 
that  the  isomorphism  is  natural  in  each  case,  the  notion  of  naturality  having  been 
defined  in  Section  VI. 6.  The  naturality  need  be  proved  in  just  one  direction  in 
each  case,  since  the  inverse  of  an  isomorphism  that  is  natural  is  an  isomorphism 
that  is  natural. 

The  first  two  identities  concern  the  interaction  of  Horn  R  with  direct  products 
and  direct  sums.  Direct  products  and  direct  sums  of  unital  left  R  modules  were 
defined  in  Examples  7  and  8  of  modules  in  Section  VIII.  1,  and  they  were  seen  to 
be  the  product  and  coproduct  functors  for  the  category  C.  If  S  is  a  nonempty  set, 
then  the  direct  product  [7vgV  Ms  of  a  family  of  unital  left  R  modules  {Ms  [  s  e  .S’  } 
is  the  module  whose  underlying  set  is  the  Cartesian  product  of  the  sets  Ms  and 
whose  operations  are  defined  coordinate  by  coordinate.  The  direct  sum  ®  se5  Ms 
is  the  R  submodule  of  elements  of  Hss5  that  are  nonzero  in  only  finitely  many 
coordinates. 

Proposition  10.12.  Let  S  be  a  nonempty  set,  let  Ms  and  Ns  be  unital  left  R 
modules  for  each  s  e  S,  and  let  M  and  N  be  unital  left  R  modules.  Then  there 
are  isomorphisms  of  abelian  groups 

(a)  Horn,  ( ®vg5  Ms,  N)  =  fUs  Horn*  (M.s,  V), 

(b)  Horn,  (M,  rises  Ns)  =  ]lses  Hom*(M,  Ns ). 

Moreover,  the  isomorphism  in  (a)  is  natural  in  the  variable  {Ms}ses  and  in  the 
variable  V,  and  the  isomorphism  in  (b)  is  natural  in  the  variable  M  and  in  the 
variable  {Arises- 

Remarks.  In  each  instance  the  assertion  of  naturality  is  that  some  square 
diagram  is  commutative,  as  illustrated  in  Figure  6.3.  For  example,  if  the  mapping 
from  left  to  right  in  the  isomorphism  (a)  is  denoted  for  fixed  N  by  ,  and 
if  a  system  of  R  homomorphisms  < ps  :  Ms  — »■  M's  is  given,  then  one  assertion  of 
naturality  for  (a)  is  that  <b{M/}seS  o  {Hom(©  <ps,  1)}  =  {Hom(®  <ps,  1)}  o  4>{MjW. 
The  other  says  for  fixed  {Ms}iSs  and  for  an  R  homomorphism  7/  :  N  N'  that 
dW  oHom(l,  \jr)  =  Hom(l,  \jr)  o  if  the  isomorphism  (a)  is  denoted  for  fixed 
®  Ms  by  ©  v  and  if  i//  :  N  N'  is  an  R  homomorphism.  Two  corresponding 
assertions  are  made  about  (b).  To  simplify  the  notation,  we  shall  usually  drop  the 
subscripts  from  <b. 

PROOF.  For  (a),  let  es  :  Ms  — >  ®,  M,  be  the  5th  inclusion,  and  let 
ps  ■  ®,  M,  Ms  be  the  ,vlh  projection;  the  latter  is  defined  as  the  restriction  of 
the  projection  associated  with  the  direct  product.  The  map  from  left  to  right  in 
(a)  is  given  by  d>(er)  =  {cr  oes}s€j  fora  inHom^  ( ®s  Ms,  N),  and  the  expected 
formula  for  the  inverse  is  <T)/ ( { r5}ies)  =  (rs  o  ps ) .  Then  we  have 
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&m<r))  =  O  es}s)  =  oeso  ps)  =  a 

S 

and  0(3>'({rs}.s))  =  4>( £  (rs  o  ps))  =  {(£(rs  op,))  oet)t 

S  S 

=  {ts  o  ps  o  es\s  =  {r0. 

Hence  is  an  isomorphism  with  inverse  3>'. 

Next  let  the  system  of  R  homomorphisms  <ps  :  M's  — >  Ms  be  given,  let 
e's  :  M's  — »■  0,  M't  be  the  ,9th  inclusion,  and  fix  N.  For  a  in  Hom^  ( 0S  MS)  N), 
we  have 

{Hom(®  <ps,  1)}s(0(ct))  =  {Hom(®  <ps,  l)}s({cr  o  es}s)  =  {a  o  es}s  o  {(ps}s 

=  {cr  o  es  o  (ps}s  =  {a  o  <ps  o  e's}s  =  [o  o  {<^}.s  o  e'j, 
=  <P(a  o  {<p0)  =  4>({Hom(®  <ps,  l)}s(a)). 

This  proves  naturality  in  the  variable  {Ms}s.  If  an  R  homomorphism^  :  N  — »■  N' 
is  given  and  if  a  is  in  Hom^  (  05  Ms,  N),  then 

<F(Hom(l,  i p){o))  =  d>(i p  o  a)  =  [q>  o  o  o  es }s 

=  Hom(l,  (p) ( {cr  o  es}s)  =  Hom(l,  <p)(4>(er)). 

This  proves  naturality  in  the  variable  N. 

For  (b),  let  ps  :  ]""[  —■ *  Ns  be  the  9th  projection.  The  map  from  left  to  right 

in  (b)  is  given  by  <J> (cr )  =  { ps  o  <r}v  for  a  in  Hom^  (M,  J~[s  Ns ),  and  the  inverse 
is  given  by  <&'({rs}iS)  =  r,  where  r  ( m )  =  {rs(m)}s.  The  proof  of  naturality  is 
similar  to  the  corresponding  proof  in  (a)  and  is  omitted.  □ 

One  ramification  of  Proposition  10.12  is  the  correspondence  of  “linear”  maps 
to  matrices  when  the  ring  R  of  scalars  is  noncommutative.  If  R  is  a  field  and  V  is 
an  n -dimensional  vector  space  over  R,  then  we  know  that  End^tV)  is  isomorphic 
as  an  R  algebra  to  the  space  Mn(R)  of  «-by-»  matrices  over  R,  the  isomorphism 
being  fixed  once  we  choose  an  ordered  basis  of  V.  Things  are  more  subtle  when 
R  is  noncommutative. 

Corollary  10.13.  Let  V  be  a  unital  left  R  module,  and  let  S  be  the  ring 
S  =  Endfl(V).  For  integers  m  >  1  and  n  >  1,  there  is  a  canonical  isomorphism 
of  abelian  groups 

Horn*!  Vn,Vm)  =  Mmn(S) 

such  that  composition  of  R  homomorphisms,  given  as  a  mapping 

Homfi(y",  Vm )  x  HomR(Vp,  V")  — >  UomR(Vp,  Vm), 
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corresponds  to  matrix  multiplication 

Mmn(S)  x  Mnp(S)  — »  Mmp(S). 

In  particular,  in  the  special  case  that  m  =  n,  this  canonical  isomorphism  becomes 
an  isomorphism  of  rings 

End  R(Vn)  =  Mn(S). 

Remarks.  For  V  =  R,  this  isomorphism  takes  the  form 
End/j(R")  =  M„(Endtf(R)) 

and  looks  like  something  familiar  from  the  case  that  R  is  a  field.  If  End^(R) 
were  to  be  isomorphic  as  a  ring  to  R.  then  the  correspondence  would  be  exactly 
what  we  might  expect  between  R  linear  mappings  from  a  free  R  module  of  rank 
n  into  itself,  with  n-by-n  matrices  with  entries  in  R.  However,  End^  (/?)  is  not 
ordinarily  isomorphic  to  R,  and  the  correspondence  is  something  different  and 
unexpected.  We  shall  sort  out  these  matters  in  Proposition  10.14  and  Corollary 
10.15. 

Proof.  Let  ej  :  V  — >■  V"  =  0"=i  V  =  ni-=i  V  be  the  jth  inclusion  for 
whatever  n  is  under  discussion,  and  let  p,  :  V'n  — »■  V  be  the  zth  projection 
for  whatever  m  is  under  discussion.  For  /  in  Hom^tV",  Vm),  define  f, ;  = 
Pifej.  Then  f) t  is  R  linear  from  V  into  V,  hence  is  in  S  =  End^( V).  If  also 
g  is  in  WomR{V p ,  Vn),  so  that  /  o  g  is  in  HomK(Vp,  V"),  then  the  formula 
ELt  ekPk  =  1  on  V"  gives 


(/  °  g)ij  =  Pifgej  =  E  PifekPkgej  =  E  fikgkj- 
k=l  k=  1 

Thus  fog  corresponds  to  the  matrix  product  [  fij  J  [gp  j ,  and  the  mapping  is  a  ring 
homomorphism.  Since 

HeifijPj  =  E  eiPifejPj  =  (  E  eiPi)f(  E  ejPj)  =  i/1  =  /. 

ij  iJ  1  j 

the  mapping  is  one-one.  If  an  arbitrary  member  [utj]  of  Mmn(S )  is  given,  then 
we  can  define  /  =  Y.k,iekukiPh  obtain  ftj  =  p,  fej  =  Ei,/  PiekukiPiej  = 
PieiUjj pjej  =  utj ,  and  conclude  that  the  mapping  is  onto.  □ 

Proposition  10.14.  The  mappings  i->  <p(l)  is  a  ring  isomorphism  End^  (R)  = 
R°  ofEndR(R)  onto  the  opposite  ring  R"  of  R. 

PROOF.  The  mapping  cp  i->  <p(  1)  certainly  respects  addition.  If  <p  maps  to  <p(\) 
and  r  maps  to  r(l),  thempr  maps  to  {(px){\)  =  (p( r (1))  =  <p(r(l)l)  =  r(l)^(l) 
since  <p  respects  left  multiplication  by  the  element  r  (1)  of  R.  The  order  of 
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multiplication  is  therefore  reversed,  and  <p  (->■  <p(l)  is  a  ring  homomorphism  of 
End/jCR)  into  R°. 

If  r  is  given  in  R°,  define  <pr  (5)  =  sr  for  s  in  R.  Then  <pr  respects  addition,  and 
it  respects  left  multiplication  by  R  because  (pr(r's )  =  r’  sr  =  r'cp,  (s).  Therefore 
<Pr  is  a  member  of  End«(/?)  such  that  ipr(l)  =  r,  and  <p  i->  ^(1)  is  onto  R° . 

If  <p  in  End r(R)  has  cp(  1)  =  0,  then  the  R  linearity  of  <p  implies  that  <p(r)  = 
(p(r  1)  =  rtp{  1)  =  r0  =  0,  so  that  <p  =  0.  Consequently  the  map  cp  i->  <p(l)  is 
one-one.  □ 


Corollary  10.15.  For  any  integer  n  >  1,  End/T/C  )  is  ring  isomorphic  to 
Mn{R°). 

Remarks.  Now  we  can  complete  the  remarks  with  Corollary  10.13:  the  case 
in  which  R  is  commutative  might  lead  us  to  believe  that  End^t/C)  is  isomorphic 
to  Mn(R),  but  the  correct  isomorphism  is  with  Mn (R°)  instead. 

PROOF.  Corollary  10.13  shows  that  End/?  (Rn)  is  isomorphic  to  M„(EndR  (/?)), 
and  Proposition  10.14  shows  that  the  latter  ring  is  isomorphic  to  M„  (R°).  □ 

The  third  identity  involving  Homfl  concerns  Homs(/?,  M),  where  M  is  a 
unital  left  R  module.  Ordinarily  Hom/?(AE  M),  when  N  and  M  are  two  unital 
left  R  modules,  is  not  an  R  module,  but  in  the  case  that  N  =  R,  it  is.  The 
definition  of  the  scalar  multiplication  by  r  e  R  is  (r<p)(r')  =  (p{r'r )  for  r'  e  R 
and  (p  e  Hom^(R,  M).  To  see  that  r<p  is  in  Hom^(tR,  M),  we  let  5  be  in  R  and 
compute  that  ( r<p)(sr ')  =  (p((sr')r )  =  (p(s(r'r ))  =  s((p{r’r))  =  s((r(p)(r')),  as 
required.  To  see  that  ( sr)(p  =  s{rcp),  we  compute  that  (( sr)(p){r ')  =  q>(r'{sr))  = 
ip((r's)r)  =  ( r<p)(r's )  =  (s(r<p))(r').  Proposition  10.16  identifies  Hom^(R,  M) 
as  an  R  module. 


Proposition  10.16.  For  any  unital  left  R  module  M,  there  is  a  canonical  R 
isomorphism 


Horn r(R,  M )  =  M, 


and  this  isomorphism  is  natural  in  the  variable  M. 

PROOF.  The  map  d>  from  left  to  right  is  given  by  <t>  (cr )  =  cr  (1),  and  the  inverse 
will  be  seen  to  be  given  by  =  rm  with  r,„(r)  =  rm.  The  computation 

< b(rcr)  =  (rcr)(l)  =  a(lr)  =  cr(rl)  =  r (cr ( 1))  =  r (<f>(tr))  shows  that  is  an 
R  homomorphism,  and  the  computation  r m(sr)  =  ( sr)m  =  s(rm)  =  s(rm(r)) 
shows  that  xm  is  in  Hom^(/?,  M). 

To  see  that  T>  is  an  isomorphism  with  inverse  O',  we  observe  that  O'O  carries 
Hom^(R,  M)  into  itself  andhas  (0'0)(cr)  =  0'(a(l))  =  ro-(i),  where  r,T(i)(r)  = 
rcr(l)  =  cr(r);  thus  (0'0)(cr)  =  cr,  and  O'O  is  the  identity.  Also,  (00')(;w)  = 
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<t>  ( r,„ )  =  rm(l)  =  bn  =  m ,  and  is  the  identity. 

For  the  naturality  let  <p  :  M  — »■  M'  be  an  R  homomorphism.  Then  we  have 
d>(Hom(l,  <p)((T))  =  O ((per)  =  cpo{  1)  =  and  naturality  is  proved.  □ 

A  relevant  observation  about  the  construction  whose  result  is  identified  in 
Proposition  10.16  is  that  we  could  get  by  with  something  more  general  than  R 
in  the  first  variable  of  Homw.  In  fact,  the  construction  would  have  worked  for 
I  IomA>(7J,  M)  for  any  unital  (R.  R)  “bimodule”  P,  i.e.,  any  abelian  group  P  that 
is  a  unital  left  R  module  and  unital  right  R  module  in  such  a  way  that  the  two 
actions  commute:  (rp)r'  =  r(pr').  More  generally  let  S  be  a  second  ring  with 
identity.  We  say  that  P  is  a  unital  (R,  S)  bimodule  if  P  is  simultaneously  a  unital 
left  R  module  and  a  unital  right  S  module  in  such  a  way  that  (rp)s  =  r(ps )  for 
r  e  R,  s  e  S,  and  p  e  P.  The  following  proposition  shows  that  P  allows  us  to 
construct  a  unital  left  S  module  out  of  any  unital  left  R  module  M. 

Proposition  10.17.  If  R  and  S  are  two  rings  with  identity,  if  P  is  a  unital 
(R,  S )  bimodule,  and  if  M  is  any  unital  left  R  module,  then  the  abelian  group 
I  Ioni/A /J.  M)  becomes  a  unital  left  S  module  under  the  definition  ( s<p)(p )  = 
(p(ps)  for  s  e  S,<p  &  Hom^(P,  M),  and  p  e  P. 

Proof.  To  see  that  sep  is  an  R  homomorphism,  we  compute  that  (s<p)(rp)  = 
(pi(rp)s)  =  (p(r(ps))  =  r(<p(ps))  =  r((s(p)(p)).  It  is  clear  that  1  acts  as  1,  and 
the  distributive  laws  are  routine.  What  needs  checking  is  the  formula  (ss')ip  = 
s(s'(p )  for  5  and  s'  in  S  and  (p  in  Hom^CP,  M).  We  compute  that  (( ss’)(p){p )  = 
(p(piss'))  =  (p({ps)sr)  =  ( s’(p){ps )  =  s((s'(p))(p),  and  the  result  follows.  □ 

An  example  of  a  unital  (R ,  S )  bimodule  P  is  a  ring  S  with  identity  such  that 
R  is  a  subring  of  S  with  the  same  identity.  Then  we  can  take  P  =  S,  with  the 
result  that  R  acts  on  the  left,  S  acts  on  the  right,  and  the  two  actions  commute  by 
the  associative  law  for  multiplication  in  S.  In  this  situation  the  passage  from  R 
to  HomR(S,  M)  is  called  a  change  of  rings,  or  extension  of  scalars,  for  M. 

In  the  special  case  that  the  rings  are  fields  and  the  modules  are  vector  spaces, 
we  saw  a  different  kind  of  change  of  rings  in  Section  VI. 6.  What  we  saw  there 
is  that  if  K  c  L  is  an  inclusion  of  fields  and  if  £  is  a  vector  space  over  K,  then 
Eh  =  E  L  has  a  canonical  scalar  multiplication  by  members  of  L  under  the 
definition  that  multiplication  by  c  e  L  is  the  linear  mapping  1  <g>  (Z  cl).  In  the 
next  section  we  shall  see  that  this  change  of  rings  by  means  of  tensor  products 
for  vector  spaces  generalizes  to  give  a  second  construction  of  a  change  of  rings 
for  modules  over  a  ring  with  identity. 
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5.  Tensor  Product  for  Modules 

In  this  section,  R  is  still  a  ring  with  identity,  and  others  rings  will  play  a  role 
as  well.  We  are  going  to  generalize  the  discussion  of  tensor  products  of  Section 
VI. 6,  extending  the  notion  from  the  tensor  product  of  two  vector  spaces  over  a 
held  to  the  tensor  product  of  a  unital  right  R  module  and  a  unital  left  R  module. 
The  tensor  product  will  ordinarily  not  have  the  structure  of  an  R  module;  it  will 
be  just  an  abelian  group.  Additional  structure  on  the  tensor  product  will  come 
from  a  bimodule  structure  on  one  or  both  of  the  given  R  modules.  For  example  it 
will  be  seen  that  the  tensor  product,  in  the  current  sense,  of  two  vector  spaces  over 
a  held  F  is  a  vector  space  over  F  because  both  vector  spaces  can  be  regarded  as 
unital  bimodules  over  F.  We  return  to  this  detail  after  giving  the  dehnition  and 
the  theorem.  Later  in  this  section  we  shall  obtain  two  fundamental  associativity 
formulas,  one  for  triple  tensor  products  and  one  involving  tensor  product  and 
Horn  together. 

Let  M  be  a  unital  right  R  module,  and  let  N  be  a  unital  left  R  module.  An  R 
bilinear  function  from  M  x  N  into  an  abelian  group  is  a  function  h  such  that 

b(jn\  +  mi,  n)  =  h(m\,  n)  +  b(mi,  n)  for  all  m i  e  M ,  m2  e  M ,  n  e  N , 

b(m,  n\  +  rii)  =  b(m ,  ri\)  +  b(m,  ni)  for  all  m  e  M ,  n\  e  N ,  ni  £  N , 

b{mr,  n)  =  b(m,  rn)  for  all  m  e  M,n  e  N,r  e  R. 

The  hrst  two  conditions  are  summarized  by  saying  that  b  is  additive  in  each 
variable.  A  tensor  product  of  M  and  N  over  R  is  a  pair  (V,  t)  consisting  of  an 
abelian  group  V  and  an  R  bilinear  map  i  :  M  x  N  —>  V  having  the  following 
universal  mapping  property:  whenever  b  is  an  R  bilinear  function  from  M  x  N 
into  an  abelian  group  A ,  then  there  exists  a  unique  abelian-group  homomorphism 
L\V^>-  A  such  that  the  diagram  in  Figure  10. 1  commutes,  i.e.,  such  that  Li  =  b 
holds  in  the  diagram.  When  i  is  understood,  one  frequently  refers  to  V  itself  as 
the  tensor  product.  The  abelian-group  homomorphism  L  :  V  —>■  A  is  called 
the  additive  extension  of  b  to  the  tensor  product.3  Theorem  10.18  below  will 
address  existence  and  essential  uniqueness  of  the  tensor  product.  Because  of  the 
essential  uniqueness,  it  is  customary  to  denote  a  tensor  product  by  M  <g>R  N,  and 
Figure  10.1  incorporates  this  notation.4  The  image  i(m,  n)  of  the  member  ( m ,  n) 
of  M  x  N  under  i  is  denoted  by  m  <8>  n. 

3  Warning.  The  name  “additive  extension”  is  in  analogy  with  the  situation  for  the  tensor  product 
of  vector  spaces  over  a  field,  in  which  the  extension  is  linear  and  really  is  an  extension.  Example  2 
below  will  show  that  the  tensor  product  of  nonzero  modules  can  be  0,  and  hence  we  do  not  always 
get  something  for  general  R  that  we  can  regard  intuitively  as  an  extension. 

4Sometimes  the  notation  N  refers  to  the  constructed  abelian  group  in  the  proof  of  Theorem 

10.18,  and  sometimes  it  refers  to  any  abelian  group  as  in  the  definition  of  tensor  product. 
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M  x  N  — ^  A 

n 

'  /' L 

m®rn 

Figure  10.1.  Universal  mapping  property  of  a  tensor  product 
of  a  right  R  module  M  and  a  left  R  module  N. 

Theorem  10.18.  Let  R  be  a  ring  with  identity.  If  M  is  a  unital  right  R  module 
and  N  is  a  unital  left  R  module,  then  there  exists  a  tensor  product  (M  ®r  N,  i)  of 
M  and  N  over  R ,  and  it  is  unique  in  the  following  sense:  if  (Vi,  ti)  and  ( Vh,  (2) 
are  two  tensor  products,  then  there  exists  a  unique  abelian-group  homomorphism 
4>  :  V\  — »■  Vj  such  that  <t>  o  1 \  =  t2,  and  it  is  an  isomorphism.  Any  tensor  product 
is  generated  as  an  abelian  group  by  the  image  of  M  x  N  in  it.  Moreover,  tensor 
product  is  a  covariant  functor  from  the  category  of  pairs  consisting  of  a  unital 
right  R  module  and  a  unital  left  R  module  to  the  category  of  abelian  groups  under 
the  following  definition:  if  <p  :  M  — >  M'  is  a  homomorphism  of  unital  right  R 
modules  and  \j>  :  N  — »■  A'  is  a  homomorphism  of  unital  left  R  modules,  then  there 
exists  a  unique  homomorphism  of  abelian  groups  <p  <S>  f/  :  M  <g)^  N  — >■  M'  ®rN' 
such  that  {cp  <g>  \{/)(m  <g>  n )  =  ( p(m )  <8>  tyin)  for  all  m  e  M  and  n  e  N. 

PROOF.  Form  the  free  abelian  group  G  with  a  Z  basis  parametrized  by  the 
elements  of  M  x  N .  We  write  e(m,  n)  for  the  basis  element  in  G  corresponding 
to  the  element  ( m ,  n)  of  M  x  N,  and  we  regard  e  as  a  one-one  function  from 
M  x  N  onto  the  Z  basis  of  G.  Let  H  be  the  subgroup  of  G  generated  by  all 
elements  of  any  of  the  forms 

e{m\  +  n?2,  n)  —  e(rn\,  n )  —  e(ni2,  «), 
e(m,  n\  +  112)  —  e(m,  nf)  —  e(m,  n-2),  (*) 

e(mr,  n)  —  e(m,  rn), 

where  the  elements  m  ,m  1 ,  m2  are  in  M,  the  elements  n ,  11  n  2  are  in  N ,  and  the 
scalarr  is  inf?.  We  define  N  to  be  the  quotient  group  G/H,  q  :  G  — G/H 

to  be  the  quotient  homomorphism,  and  1  to  be  the  function  ( m ,  n)  e(m  ,n)  +  H 
from  M  x  N  into  G/H.  The  function  1  is  therefore  given  by  1  =  q  o  e. 

Let  us  prove  that  ( M  ®rN,i)  is  a  tensor  product  of  M  and  N  over  R.  Each  of 
the  elements  in  (*)  lies  in  H  and  hence  is  mapped  by  q  into  the  0  coset  of  G/H. 
Since  q  is  a  homomorphism  and  since  1  =  q  o  e,  we  obtain 


t(mi  +  m2,  n)  =  j(mi,  n )  +  ((m2,  n) 
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from  the  first  relation  in  (*)  and  similar  equalities  from  the  other  two  relations. 
Therefore  i  :  M  x  N  — >  M  N  is  an  R  bilinear  function. 

Now  let  Z?  :  M  x  ;V  — »■  A  be  an  R  bilinear  function  from  M  x  N  into  an 
abelian  group  A.  The  universal  mapping  property  in  Figure  8.2  for  free  abelian 
groups  shows  that  there  exists  a  unique  group  homomorphism  L  :  G  — »■  A  such 
that  L(e(m,  n))  =  b(m ,  n)  for  all  (m,  n)  in  M  x  N .  For  the  first  expression  in 
(*),  we  have 

L(e(m\  +  m2,  n )  —  e(mi,  n )  —  e(ni2,  «)) 

=  L(e(m  1  +  m2,  n ))  —  L(e(m\,  n))  —  L(e(ni2,  n )) 

=  b{m\  +  m2,  n)  —  b(m\,  n)  —  b(ni2,  n). 

The  right  side  is  0  since  b  is  R  bilinear,  and  a  similar  conclusion  applies  to  the 
other  two  expressions  in  (*) .  Therefore  each  member  of  (*)  lies  in  the  kernel  of  L , 
and  the  generated  subgroup  H  lies  in  the  kernel  of  L.  Consequently  L  descends 
to  a  group  homomorphism  L  :  G/H  — »■  A,  i.e.,  there  exists  L  with  L  =  L  o  q. 
On  any  element  {m ,  n)  in  M  x  N ,  we  then  have  Loi  =  Loqoe  =  Loe  =  b. 
This  proves  the  existence  asserted  by  the  universal  mapping  property  for  a  tensor 
product  over  R.  For  the  asserted  uniqueness,  the  formula  L  o  1  —  h  shows  that  L 
is  determined  uniquely  by  b  on  i(M  x  N).  It  is  immediate  from  the  definition  of 
M  <Sir  N  that  i(M  x  N)  generates  M  N ,  and  thus  L  is  determined  uniquely 
on  all  of  M®r  N. 

Therefore  (M  <g>^  N,  1)  is  a  tensor  product.  Problems  18-22  at  the  end  of 
Chapter  VI  show  that  the  uniqueness  up  to  the  asserted  isomorphism  follows 
from  general  category  theory. 

We  are  left  with  defining  <p  <g>  1//  when  (p  :  M  ^  M'  and  \jr  :  N  — >  N'  are 
given,  and  to  showing  that  this  definition  makes  tensor  product  into  a  covariant 
functor.  Define  b  :  M  x  N  — >■  M'  <g>«  N'  by  b(m,  n )  =  <p(m )  <g>  \js(n).  Then  b  is 
R  bilinear  into  an  abelian  group,  the  property  b{mr,  n  )  =  b(m,  rn)  being  verified 
by  the  calculation 

b(mr,  n )  =  (p(mr )  <g>  1 p(n)  =  <p{m)r  <g> 

=  cp(m)  <8>  r\Js(n)  =  (p(m)  <g>  1 frirn)  =  b(m,  rn). 

The  additive  extension  of  b  to  M  (g)fi  N  is  taken  to  be  <p  0  \j/.  The  formula  is 
therefore  (<p  <g>  1 /r)(m  <8>  n)  =  <p(m )  <g>  1 fr(n).  If  we  are  given  also  <p'  :  M'  — >■  M" 
and  %!?'  :  N'  -*  N" ,  then 

(ip'  (g)  \/r')((p  <g)  ip)(m  <g)  n)  =  (ip'  <g>  1 lr')(np(m)  <g>  1 /s(n))  =  (p'(p(m)  <g>  f'f(n) 

=  ( ip' ip  <g>  \lr'\lr)(m  (g)  n). 

Since  the  elements  m  <g>  n  generate  M  W,  we  obtain  (< p'  (g )  i[r')(ip  <g)  i[r)  = 
ip'  ip  x/r .  Similarly  we  check  that  1  m  <8>  1  n  =  1  m®n-  Therefore  tensor  product 

is  a  covariant  functor.  □ 
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As  in  the  last  part  of  the  above  proof,  the  general  procedure  for  constructing 
an  abelian-group  homomorphism  L  :  M  N  — »■  A  is  somehow  to  define  an 
R  bilinear  function  b  :  M  x  A  — >■  A  and  to  take  the  additive  extension  from 
Theorem  10.18  as  the  desired  homomorphism.  Once  one  has  observed  that  the 
expression  b{m ,  n)  is  of  a  form  that  makes  it  R  bilinear,  then  the  homomorphism 
L  is  defined  and  is  uniquely  determined  by  its  values  on  elements  m  ®n,  according 
to  the  theorem. 

In  practice,  M  or  A  often  has  some  additional  structure,  and  that  structure 
may  be  reflected  in  some  additional  property  of  the  tensor  product.  The  corollary 
below  addresses  some  situations  of  this  kind. 

Corollary  10.19.  Let  R ,  S,  and  T  be  rings  with  identity,  and  suppose  that  M 
is  a  unital  right  R  module  and  A  is  a  unital  left  R  module.  Under  the  additional 
hypothesis  that 

(a)  M  is  a  unital  (S,  R)  bimodule,  then  M  <S>r  N  is  a  unital  left  S  module  in 
a  unique  way  such  that  s(m  <g>  n)  =  sm  <g>  n  for  all  m  £  M,n  £  A,  and 
s  £  S, 

(b)  A  is  a  unital  ( R.T )  bimodule,  then  M  (& K  N  i s  a  unital  right  T  module 
in  a  unique  way  such  that  (m  ®n)t  =  m®nt  for  all  m  £  M,  n  £  A,  and 
t  £  T, 

(c)  M  is  a  unital  ( S ,  R)  bimodule  and  A  is  a  unital  ( R ,  T )  bimodule,  then 
M  <S)r  A  is  a  unital  ( R ,  T )  bimodule  under  the  left  R  module  structure 
in  (a)  and  the  right  T  module  structure  in  (b). 

PROOF.  In  (a),  let  left  multiplication  by  5  e  S  within  M  be  given  by  cps  :  M 
M  with  ( ps(m )  =  sm.  Then  multiplication  by  .s  in  S  within  M  <S)r  N  is  given 
by  tps  <g>  1.  The  covariant-functor  property  makes  (ps(ps'  =  <Av  and  <Pi  =  1.  ar>d 
the  distributive  properties  follow  from  the  definitions  and  the  fact  that  each  cps 
is  a  homomorphism  of  the  additive  group  M.  This  proves  (a),  and  (b)  is  proved 
similarly.  For  (c),  if  left  multiplication  by  s  £  S  within  M  is  given  by  <ps  and  if 
right  multiplication  by  t  £  T  within  A  is  given  by  \jf, ,  then  the  commutativity  of 
the  operations  on  M  <g)s  A  follows  from  the  fact  that  the  additive  homomorphisms 
( ps  <g>  1  and  1  <g)  \f/t  commute  with  each  other.  □ 

Examples. 

(1)  R  <S)r  M  =  M  as  an  isomorphism  of  left  R  modules  whenever  M  is  a  left 
R  module.  Flere  we  regard  R  as  a  unital  (R.  R)  bimodule,  so  that  R®r  M  =  M 
has  the  structure  of  a  unital  left  R  module  by  Corollary  10.19a.  The  mapping  of 
left  to  right  is  the  additive  extension  <t>  of  the  R  bilinear  function  b(r.  m )  =  rm, 
satisfying  d>(r  <g>  m)  =  rm.  It  respects  the  left  action  by  R.  The  two-sided 
inverse  O'  to  <t>  is  given  by  <t> ' ( m )  =  1  <g )  m.  Then  O'  o  O  is  the  identity  since 
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<t>'(<t>(r  <g>  m))  =  <t>'(rm )  =  1  <g>  rm  =  r  ®  m,  and  O  o  O'  is  the  identity  since 
0(0' (m))  =  0(1  <g>  m)  =  bn  =  m.  The  R  isomorphism  R  ®R  M  =  M  is 
natural  in  M.  In  fact,  if  <p  :  M  —*■  M'  is  given,  then 

<p(0(r  ®  m))  =  ( p(rm )  =  rcp(m) 

=  0(r  <g>  (p(m))  =  0((1  <g>  ^)(r  <g)  m)). 

(2)  /?  =  Z.  In  this  case,  M  <g+  TV  is  the  tensor  product  of  abelian  groups. 
Let  us  consider  what  abelian  group  we  obtain  when  M  and  N  are  both  finitely 
generated.  Proposition  10.21  below  shows  that  direct  sums  pull  out  of  any  tensor 
product,  and  hence  it  is  enough  to  treat  the  tensor  product  of  two  cyclic  groups. 
For  Z  A,  we  get  A  by  Example  1,  and  Proposition  10.20  below  shows  that 
A  <g>z  Z  gives  the  same  thing.  Problem  3  at  the  end  of  the  chapter  identifies  the 
tensor  product  of  two  arbitrary  finite  cyclic  groups  (Z/kZ)  <g +  (Z//Z).  For  now, 
let  us  verify  in  the  special  case  that  GCD(k,  /)  =  1  that  (Z/kZ)  <g+  (Z//Z)  =  0. 
This  tensor  product  is  a  unital  Z  module,  being  an  abelian  group,  and  Corollary 
10. 19a  shows  that  the  action  by  Z  is  given  by  c(a  <g  b)  =  ca  <g)  b  for  any  integer 
c.  Then  we  have  0  =  (A’l)  <g>  1  =  k{l  <g>  1)  and  0  =  1  <g>  (/ 1)  =  (1/)  <g)  1  = 
(/ 1)  <g>  1  =  /(I  <g>  1).  Choosing  integers  x  and  y  such  that  xk  +  yl  =  1,  we  see 
that  1  <g>  1  =  x(k(  1  <g>  1))  +  y(/(l  <g)  1))  =0  +  0  =  0.  The  tensor  product  is 
generated  by  1  <g>  1 ,  and  thus  the  tensor  product  is  0. 

(3)  R  equal  to  a  commutative  ring  with  identity.  Then  M  is  an  (R ,  R  )  bimodule, 
since  any  unital  left  module  for  a  commutative  ring  is  a  right  module  under  the 
definition  mr  =  rm  and  vice  versa.  Corollary  10.19  shows  therefore  that  M tg>  /,>  N 
is  a  unital  R  module.  The  special  case  that  R  is  a  field  was  treated  in  Section 
VI.6. 

(4)  M  equal  to  a  ring  S  with  R  as  a  subring  with  the  same  identity.  Then  we  can 
regard  S  as  a  unital  ( S ,  R)  bimodule,  and  Corollary  10.19a  shows  that  S  <g+  M 
is  a  unital  left  S  module.  The  passage  from  M  to  S  <g+  M  is  a  second  kind  of 
change  of  rings,  or  extension  of  scalars,  the  first  kind  being  the  passage  from 
M  to  I  lorn ++,  M)  as  in  the  previous  section.  Complexification  of  a  real  vector 
space  V  as  V  <8>r  C  is  an  instance  of  this  change  of  rings  by  means  of  tensor 
products.  (Here  we  are  taking  into  account  the  isomorphism  V  <8>r  C  =  C  <8>r  V 
given  in  Proposition  10.20  below.) 

(5)  M  and  N  equal  to  associative  R  algebras  with  identity  over  a  commutative 
ring  R  with  identity.  Proposition  10.24  below  shows  that  M  N  is  another 
associative  algebra  with  identity  over  R ,  with  a  multiplication  such  that 

(tn i  ®n\)(m2  ®nj)  =  m\ni2  ®n\U2. 

In  this  case  the  additional  structure  on  the  tensor  product  is  not  a  consequence  of 
Corollary  10.19,  and  additional  argument  is  necessary. 
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The  rest  of  this  section  will  be  devoted  to  establishing  some  identities  for  tensor 
product,  together  with  their  naturality,  and  to  proving  that  the  tensor  product  over 
R  of  two  R  algebras,  for  a  commutative  ring  R  with  identity,  is  again  an  R  algebra. 
Each  identity  involves  setting  up  a  homomorphism  involving  one  or  more  tensor 
products,  and  it  is  necessary  to  prove  in  each  case  that  the  homomorphism  is  an 
isomorphism.  For  this  purpose  it  is  often  inconvenient  to  prove  directly  that  the 
homomorphism  has  0  kernel  and  is  onto.  In  such  cases  one  constructs  what  ought 
to  be  the  inverse  homomorphism  and  proves  that  it  is  indeed  a  two-sided  inverse. 

Proposition  10.20,  Let  R  be  a  ring  with  identity,  let  M  be  a  unital  right  R 
module,  and  let  N  be  a  unital  left  R  module.  Let  R°  be  the  opposite  ring  of  R, 
let  M°  be  M  regarded  as  a  left  R°  module,  and  let  N°  be  N  regarded  as  a  right 
R°  module.  Then 

M®RN  =  N°  ®Ro  m° 


under  the  unique  homomorphism  of  abelian  groups  carrying  m  <g> «  in  M  ®R  N 
into  n  ®  m  in  /V"  <g )R»  M° .  The  isomorphism  is  natural  in  the  variables  M  and  N . 

Remark.  To  make  the  proof  below  a  little  clearer,  we  shall  distinguish  between 
elements  of  M  and  M”,  writing  m  in  the  first  case  and  m°  in  the  second  case, 
even  though  m°  =  m  under  our  definitions.  A  similar  notational  convention  will 
be  in  force  for  N. 

PROOF.  The  map  (m ,  n)  m*-  n°  <g>  m°  is  additive  in  each  variable  and  carries 
(m,  rn)  to  (rn)°®m°  =  n°r°  ®m°  =  n°  ®r°m°  =  n°®(mr)°.  This  expression 
is  the  image  also  of  (mr.  n),  and  hence  ( m ,  n)  i->  n°  <g>  m°  is  R  bilinear  and  has 
an  additive  extension  to  M  ®R  N .  Arguing  similarly,  we  readily  construct  a 
homomorphism  O'  :  N°  ®R°  M°  ->  M  ®R  N.  It  is  immediate  that  O'  is  a  two- 
sided  inverse  to  O,  and  the  isomorphism  follows.  For  the  naturality  in  M ,  suppose 
that  tp  :  M  — >  M'  is  an  R  homomorphism.  Write  <p°  for  the  homomorphism 
with  (p°(m° )  =  Then  (1  <g>  <p°)(0(/u  <g>  n))  =  (1  <g>  tp°)(n°  <g>  m°)  = 

n°  (g)  (p°(m° )  =  n°  (g)  (i p(m))°  =  O (< p{m)  <g)  n)  =  O ((<p  <g)  l)(m  <g)  n}).  This 
proves  the  naturality  in  the  M  variable,  and  naturality  in  the  N  variable  is  proved 
similarly.  □ 

Proposition  10.21.  Let  R  be  a  ring  with  identity,  let  5  be  a  nonempty  set,  let 
Ms  be  a  unital  right  R  module  for  each  s  e  S,  and  let  A  be  a  unital  left  R  module. 
Then 

( ©  Ms)  ®R  N  =  ©  (Ms  ®R  N ) 

ssS  seS 


as  abelian  groups,  and  the  isomorphism  is  natural  in  the  tuple  ((AThs.s,  N). 
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Remarks.  A  similar  conclusion  holds  if  the  direct  sum  occurs  in  the  second 
member  of  the  tensor  product,  as  a  consequence  of  Proposition  10.20.  The 
naturality  carries  with  it  some  additional  conclusions.  For  example,  if  each  Ms  is 
a  unital  ( T ,  R)  bimodule  for  a  ring  T  with  identity,  then  the  displayed  isomorphism 
is  an  isomorphism  of  left  T  modules. 

Proof.  The  map  ({ms}s ,  n)  {ms  0  n}s  is  R  bilinear  from  (  ®ss5  Ms )  x  N 
into  ©veS-  (Ms  0R  N ),  and  its  additive  extension  <t>  is  the  homomorphism  from 
left  to  right  in  the  displayed  isomorphism.  It  has  <f>({m5}s  <g>  n)  =  {ms  0  n}s. 
To  construct  the  inverse,  let  is  :  Ms  — >  ©,s5  Mt  be  the  .s,lh  inclusion.  Then 
( ms ,  n)  i->  is(ms )  0  n  is  R  bilinear  into  (  ©ig5  Ms)  0R  N  and  has  an  additive 
extension  carrying  ms  0  n  to  is(ms)  0  n  in  (  ©ss5  0>R  N.  The  universal 
mapping  property  of  direct  sums  of  abelian  groups  then  gives  us  a  corresponding 
abelian-group  homomorphism  d>'  :  ©ssS  (Ms  <S)R  N)  — >  ( ©ss5  Ms)  0R  N.  It 
has  0  «}s)  =  {ms}s  0  n.  It  is  immediate  that  $'o$  fixes  each  {ms}s  0  n 

and  hence  is  the  identity,  and  that  $o4>'  fixes  each  {ms  0  n  }  v  and  hence  is  the 
identity. 

For  the  naturality  let  <ps  :  Ms  — >  M's  be  an  R  homomorphism  of  right  R 
modules,  and  let  \jr  :  N  — >■  N'  be  an  R  homomorphism  of  left  R  modules.  Then 

$(({f>j}s  ®  t)({ms}s  0  n))  =  ®({<ps(ms)}s  0  fin))  =  [(ps(ms)  0  f(n)}s 

=  {<Ps  0  f}s({ms  0  n})  =  {cps  0  f}s(<t>({ms}  0  n), 

and  naturality  is  proved.  □ 

Proposition  10.22.  Let  R  and  S  be  rings  with  identity,  let  M  be  a  unital  right 
R  module,  let  A  be  a  unital  (R.  S)  bimodule,  and  let  P  be  a  unital  left  S  module. 
Then 

(M  0rN)0s  P  =  M  0R  ( N  0s  P) 

under  the  unique  homomorphism  <F  of  abelian  groups  such  that  <f>  (( m0n )  0  p )  = 
m  0  (n  0  p ).  The  isomorphism  is  natural  in  the  triple  ( M ,  N ,  P). 

Remarks.  As  with  Proposition  10.21,  the  naturality  carries  with  it  some 
additional  conclusions.  For  example,  if  T  is  a  ring  with  identity  and  M  is  actually 
a  unital  ( T ,  R)  bimodule,  then  the  isomorphism  is  one  of  left  T  modules. 

PROOF.  For  fixed  p,  the  map  (m ,  n.  p)  i— >  m  0  in  0  p)  is  R  bilinear.  In  fact, 
the  map  is  certainly  additive  in  m  and  in  n.  For  the  transformation  law  with  an 
element  r  of  R.  the  calculation  is  (mr,  n.  p)  mr  0  in  0  p)  =  m  0  rin  0  p)  = 
m  0  irn  0  p),  and  this  is  the  image  of  (in.  rn,  p). 

Thus  for  each  fixed  p,  we  have  a  unique  well-defined  extension,  additive  in 
m  and  n,  carrying  ( in  0  n.  p)  to  m  0  in  0  p).  Using  the  uniqueness,  we  see 
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that  this  extended  map  is  additive  in  the  variables  m  <g>  n  and  p.  Also,  if  5  is  in 
S,  then  ((m  <g>  n)s,  p)  =  (tn  <g>  ns,  p)  maps  to  m  <g>  ( ns  <g>  p)  =  m  <g>  (n  <g>  sp), 
which  is  the  image  of  (m  <g>  n,  sp),  and  therefore  ( m  <g>  n,  p)  i->  m  <g>  (n  <g>  p)  is 
S  bilinear.  Consequently  there  exists  a  homomorphism  <f>  of  abelian  groups  as  in 
the  statement  of  the  proposition. 

A  similar  argument  produces  a  homomorphism  O'  of  abelian  groups  carrying 
the  right  member  of  the  display  to  the  left  member  such  that  <f >'(m  <g>  (n  <g)  p))  = 
(m  <g>  n)  <g>  p.  On  the  generating  elements,  we  see  that  O'  o  O  and  O  o  O'  are  the 
identity.  This  proves  the  isomorphism. 

For  the  naturality,  let  <p  :  M  — >■  M',  i jr  :  N  A',  and  r  :  P  — >■  P'  be  maps 
respecting  the  appropriate  module  structure  in  each  case.  Then 

0(((<p<gn/>-)  <g>  r)((m  <g)  n)  <g>  p))  =  O (f<p  <g)  x/s)(m  <g  n)  <g>  r (p)) 

=  0((ip(m)  <g  i/f(n))  <g>  r(p))  =  cp(m)  <g>  (i/f(«)  <g)  r(p)) 

=  (<p  <g>  (i/c  <g>  r))(m  <g>  (n  <g>  p))  =  (cp  <g)  (i/r  <g>  r))(0((m  <g>  «)  <g>  p)), 

and  naturality  is  proved.  □ 

Proposition  10.23.  Let  and  S  be  rings  with  identity,  let  M  be  a  unital  left 
R  module,  let  A  be  a  unital  {S,  R)  bimodule,  and  let  P  be  a  unital  left  S  module. 
Then 

Homs  (A  0*  M,  P)  =  Horn  R(M,  Hom5(A,  P)) 

under  the  homomorphism  O  of  abelian  groups  dehned  by  0(<p)(m)(«)  = 
<p(n  <g>  m)  for  m  e  M,  n  e  A,  and  i p  e  Homs(M  <g) ^  A,  P).  The  isomorphism 
is  natural  in  the  variables  (A,  M)  and  P. 

Remarks.  In  the  displayed  isomorphism,  A  <g>R  M  on  the  left  side  is  au¬ 
tomatically  a  left  S  module,  and  hence  Homs  (A  <g>s  M,  P)  is  a  well-defined 
abelian  group.  For  the  right  side,  Proposition  10.17  shows  that  Homs  (A.  P) 
is  a  left  R  module  under  the  definition  (rr)(n)  =  x (nr);  consequently 
I  Ioni/A/W.  HomsfA,  P))  is  a  well-defined  abelian  group.  The  naturality  in  the 
conclusion  allows  one  to  conclude,  for  example,  that  if  M  is  in  fact  a  unital 
(R,T)  bimodule  for  a  ring  T  with  identity,  then  the  displayed  isomorphism  is  an 
isomorphism  of  left  T  modules. 

PROOF.  The  homomorphism  <f>  is  well  dehned.  We  construct  its  inverse.  If  i fr 
is  in  Hom^f M,  HomsfA,  P)),  then  the  map  (n,  m)  i->  i Js(m)(n)  sends  (nr,  m) 
to  \lr(m)(nr )  =  (r(\jr(m))(n)  =  (i f(rm))(n),  and  this  is  the  image  of  («,  rm). 
Hence  (n,  m)  \j/(m)(n)  is  R  bilinear  and  yields  a  map  of  A  <g>A-  M  into  P  such 
that  n®m  maps  to  \j/(ni)(n).  The  latter  map  is  an  S  homomorphism  since  sn®m 
maps  to  \jf(m)(sn )  =  s((fr(m)(n)),  which  is  s  applied  to  the  image  of  n  <g>  m.  We 
define  <F'( i/O  to  be  the  map  dehned  on  A  <g>^  M  with  <i>'(xfr)(n  ®m)  =  tfr(m)(ti). 
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Then  <F'(d>(<p))( n  <g>  m )  =  lE>((p)(m)(n)  =  cp(n  <g>  m)  shows  that  <h'  o  d>  is  the 
identity,  and  <J> ( <t>' (m) («)  =  d>'(x//)(n  <g>  m )  =  i js(m)(n)  shows  that  $o$' 
is  the  identity.  Hence  is  an  isomorphism  of  abelian  groups. 

For  naturality  in  (N,  M),  let  a  :  N'  — >  N  and  r  :  M'  — »■  M  he  given.  Then 

O (Hornfcr  <g>  r,  1  )(p){m'){n')  =  (Hom(cr  <g>  r,  1  )(<p))(n/  <g>  m') 

=  cp(p  <g>  r )(n'  (g)  m')  =  <p(o{n')  <g)  x{m'))  =  <J> (93) (r (/?z')) (cr (^z')) 
=  Hom(r,  Hom(cr,  l))(4>(^))(m,)(n'), 

and  naturality  is  proved  in  (N,  M).  For  naturality  in  P,  let  a  :  P  P'  be  given. 
Then 

d>(Hom(l,  a)(p){m)(n)  =  (Hom(l,  a)(p){n  (g)  m )  =  aq>(n  <g)  m) 

=  cr((<I>(^))(m)(n))  =  Hom(l,  Hom(l,  cr))(<I>(^))(m)(n), 

and  naturality  is  proved  in  P.  □ 

Proposition  10.24.  Let  R  be  a  commutative  ring  with  identity,  and  let  M  and 
N  be  associative  R  algebras  with  identity.  Then  M  g>A>  /V  is  an  associative  R 
algebra  with  identity  under  the  unique  multiplication  law  satisfying 

( m  <g>  n){m  <g>  n)  =  mm'  <g>  nn  . 

PROOF.  What  we  know  from  Example  3  is  that  M  <g>«  N  is  a  unital  R  module. 
We  need  to  define  the  associative-algebra  multiplication  in  M  <g^  N  and  check 
that  it  satisfies  the  required  properties. 

Let  ix(m)  and  \>(n)  be  the  left  multiplication  operators  in  M  and  N  defined  by 
n(m)(m')  =  mm'  and  v(n)(n')  =  nn'.  The  fact  that  R  is  central  in  M  means 
that  / i(m){rm ')  =  mrm'  =  rmm'  =  rp(m)(m')  and  hence  that  the  mapping 
/x(m)  :  M  M  is  a  homomorphism  of  R  modules.  Similarly  v(n)  :  N  N 
is  a  homomorphism  of  R  modules.  Therefore  yu.  (m )  <g>  \>{n)  is  a  well-dehned 
homomorphism  of  abelian  groups  for  each  (m ,  n)  in  iVl  x  N,  and  b ( m .  n)  = 
/i  (m  )<g)  v  (n)  is  a  well-dehned  map  of  M  x  N  into  the  abelian  group  Endz  (M< g)  rN). 
The  map  b  is  certainly  additive  in  the  M  variable  and  in  the  N  variable.  If  r  is  in 
R.  then  b(mr,  n)  =  n(mr )  <g)  v(n).  Since 

( fi(mr )  <g>  v(n))(m'  <g>  n')  =  mrm'  (g)  nn'  =  mm  r  <g>  nn 

=  mm  <g>  run  =  (/x(m)  <g>  v(rn))(m  (g)  n'), 

we  see  that  b(mr,  n)  =  b ( m ,  rn).  Thus  b  is  R  bilinear  and  extends  to  a  homo¬ 
morphism  L  :  M  <g>/,'  N  End z(M  <g)«  N)  of  abelian  groups. 
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For  x  and  y  in  M®RN,  we  define  a  product  by  xy  =  L{x){y).  Since  L(x)  is  in 
End z(M  <S)r  N),  we  have  x(y\  +  )a)  =  xy \  +  xj2-  Since  L  is  a  homomorphism, 
L{x  1  +  X2)  =  L(x  1)  +  L(x 2),  and  therefore  (xi  +  X2 )y  =  X\ y  +  X2J.  The 
element  1m  <8>  ljv, where  1  m  and  1  ,y  are  the  respective  identities  of  M  and  N,  is  a 
two-sided  identity  for  M  N.  Since  M  <g> «  /V  is  a  two-sided  unital  R  module, 
we  have  rx  =  xr,  and  thus  R  ( I  m  ®  I  ,v )  lies  in  the  center  of  M  N.  Therefore 
the  product  operation  is  R  linear  in  each  variable. 

Suppose  that  x  =  m  <g>  n  and  y  =  m '  <g >  n'.  Then  we  have 

xy  =  L(x)(y)  =  L(m  (g)  n)(m'  <g>  n')  =  b(m,  n)(m  <g>  n') 

=  (i-i(m)  (g)  v(ji))(m'  (g)  n)  =  mm'  <g>  nn 
as  asserted  in  the  statement  of  the  proposition.  Consequently 

(m  <g>  n)((m'  <g)  n'){m "  (g)  n")j  =  (m  <g)  n){m'm"  (g)  n! n")  =  m(m! m")  (g)  n{n'n") 

=  (mm')m11  <g)  (nn')n"  =  (mm'  (g)  nn')(m”  <g>  n") 
=  ( (m  <g>  ri)(m!  (g)  n'))(m"  <g)  n"). 

This  proves  associativity  of  multiplication  on  elements  of  the  form  m  <g> «.  Since 
these  elements  generate  the  tensor  product  as  an  abelian  group  and  since  the 
distributive  laws  hold,  associativity  holds  in  general.  □ 


6.  Exact  Sequences 


Consider  a  diagram  of  abelian  groups  and  group  homomorphisms  of  the  form 


iPn-l 


> 


^72+1 


< Pn+2 


' 


where  Mn-\,  Mn,  Mn+\,  etc.,  are  abelian  groups  and  <pn-\,  cpn,  <pn+i,  <Pn+2,  etc., 
are  homomorphisms.  The  diagram  can  be  finite  or  infinite,  and  the  particular  kind 
of  indexing  is  not  important.  The  sequence  in  question  is  called  a  complex  if  all 
consecutive  compositions  are  0,  i.e.,  if  (pk+  \  <pt  =  0  for  all  k.  This  condition  is 
equivalent  to  having  image  (<^.)  C  ker(<^+i)  and  is  the  backdrop  for  the  traditional 
definitions  of  homology  and  cohomology  groups,  which  are  the  various  quotients 

ker(<p*+i)/ imaged). 


Examples  of  complexes. 

(1)  The  simplicial  homology  of  a  simplicial  complex.  For  this  situation  the 
indexing  is  reversed  (say  by  replacing  n  by  —n),  so  that  the  homomorphisms 
lower  the  index.  Each  group  Mn  is  a  group  whose  elements  are  called  “chains,” 
and  the  homomorphisms  are  called  “boundary  maps.”  The  chains  in  the  kernel 
of  one  of  the  homomorphisms  are  said  to  be  "closed,”  and  those  in  the  image 
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of  a  homomorphism  are  said  to  be  “exact.”  The  quotient  of  the  two,  taking  into 
account  the  reversal  of  the  indexing,  is  the  system  of  simplicial  homology  groups 
of  the  simplicial  complex. 

(2)  The  de  Rham  cohomology  of  a  smooth  manifold.  For  this  situation  the 
indexing  goes  upward  as  indicated,  the  group  Mn  is  the  vector  space  of  smooth 
differential  forms  of  degree  n ,  the  homomorphisms  are  the  restrictions  to  these 
spaces  of  the  linear  de  Rham  operator  d.  ker(<p„+i)  is  the  vector  subspace  of 
“closed”  forms,  image(<p„)  is  the  vector  subspace  of  “exact”  forms,  and  the 
quotient  ker(^„+i)/ imaged,,)  is  the  77th  de  Rham  cohomology  space  of  the 
manifold. 

(3)  Cohomology  of  groups.  This  was  defined  in  Section  VII.6,  knowledge 
of  which  is  not  assumed  in  the  present  chapter.  The  result  that  shows  that  the 
appropriate  sequence  is  a  complex  is  Proposition  7.39,  for  which  we  gave  a  direct 
but  complicated  combinatorial  proof. 

The  above  sequence  is  said  to  be  exact  at  Mn  if  ker(<p„+i)  =  image  (<p„).  It 
is  said  to  be  an  exact  sequence  if  it  is  exact  at  every  group  in  the  sequence. 
The  condition  of  exactness  may  be  viewed  as  having  two  parts  to  it.  One  is  the 
inclusion  image(<p„)  C  ker(<p„+i)  that  enters  the  definition  of  complex.  Since 
this  condition  says  that  <pn+i(pn  =  0,  it  is  often  easy  to  check.  The  other  condition 
is  that  ker(<p„  +1)  c  imagc((/9„  ),  a  condition  that  often  is  more  difficult  to  check. 

The  extent  to  which  a  complex  fails  to  be  exact  plays  a  fundamental  role  in  the 
subject  of  homological  algebra.  This  is  a  subject  that  for  the  most  part  is  left  to 
Chapter  IV  of  Advanced  Algebra.  That  chapter  will  put  the  examples  above  into 
a  wider  context,  and  it  will  develop  techniques  for  working  with  homology  and 
cohomology.  In  the  present  section  we  shall  give  the  barest  hint  of  an  introduction 
to  the  subject  by  discussing  some  of  the  effects  of  the  Horn  functor  and  the  tensor 
product  functor  on  exact  sequences. 

Let  us  establish  a  setting  for  applying  a  functor  F  to  an  exact  sequence  or  more 
general  complex.  For  current  purposes  we  have  in  mind  that  F  is  Flom  in  one  of 
its  two  variables  or  is  tensor  product  in  one  of  its  two  variables.  First  we  need 
to  have  two  categories  available  so  that  F  carries  the  one  category  to  the  other. 
These  categories  will  have  to  satisfy  some  properties,  but  we  shall  not  attempt  to 
list  such  properties  at  this  time.5  Let  us  be  content  with  some  familiar  examples  of 
categories  whose  objects  are  abelian  groups  with  additional  structure  and  whose 
morphisms  are  group  homomorphisms  with  additional  structure.  Specifically  let 
R  be  a  ring  with  identity,  let  Cr  be  the  category  of  all  unital  left  R  modules,  and 
let  VR  be  the  category  of  all  unital  right  R  modules.  We  suppose  that  our  functor 

5  The  appropriate  notion  is  that  of  an  "abelian  category,”  which  is  defined  in  Section  IV.  8  of 
Advanced  Algebra. 
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F  carries  some  Cr  or  VR  to  another  such  category,  possibly  for  a  different  ring. 
The  functor  F  can  be  covariant  or  contravariant.  We  require  also  of  F  that  it  be 
an  additive  functor,  i.e.,  that  F(<p\  +  <p2)  =  F(<p\  )  +  F{(p2)  for  any  maps  <p\  and 
tp2  that  lie  in  the  same  Horn  group. 

With  the  additional  structure  in  place,  we  can  now  introduce  the  notions  of 
complex  and  exact  sequence  for  the  domain  and  range  categories  of  F,  not  just 
for  the  category  of  abelian  groups.  In  this  case  the  abelian  groups  in  the  sequence 
are  to  be  objects  in  the  category,  and  the  group  homomorphisms  in  the  sequence 
are  to  be  morphisms  in  the  category;  otherwise  the  definitions  are  unchanged. 
The  condition  that  F  be  additive  implies  that  F  carries  any  0  map  to  a  0  map,  and 
that  property  will  be  key  for  us.  In  fact,  we  can  apply  F  to  any  complex  in  the 
domain  category  (by  applying  it  to  each  object  and  morphism  in  the  sequence); 
after  F  is  applied,  the  arrows  point  the  same  way  if  F  is  covariant,  and  they  point 
the  opposite  way  if  F  is  contravariant.  If  F  is  covariant,  it  sends  any  consecutive 
composition  0  =  (pk+Wk  toO  =  F(0)  =  F(<pk+i<pk)  =  F(cpk+i)F(<pk)',  therefore 
the  consecutive  composition  of  F  of  the  maps  is  0,  and  we  obtain  a  complex. 
If  F  is  contravariant,  we  have  0  =  F( 0)  =  F{tpk+\(pk)  =  F{(pk)F(fPk+\)\  the 
consecutive  composition  of  F  of  the  maps  is  still  0,  and  we  still  obtain  a  complex. 
Thus  the  additive  functor  F  sends  any  complex  to  a  complex. 

However,  not  all  additive  functors  invariably  send  exact  sequences  to  exact 
sequences,  as  we  shall  see  with  Horn  and  tensor  product  in  the  category  C%.  Yet 
they  each  preserve  some  features  of  certain  exact  sequences,  even  when  Z  is 
replaced  by  a  general  ring  with  identity.  To  be  precise  we  introduce  the  following 
definition. 

A  short  exact  sequence  in  our  category  is  an  exact  sequence  of  the  form 

0  — *  M  N  P  — *  0. 

Exactness  of  this  sequence  incorporates  three  conditions: 

(i)  tp  is  one-one, 

(ii)  keri fr  =  images, 

(iii)  i {/  is  onto. 

In  fact,  the  three  conditions  are  precisely  the  conditions  of  exactness  at  M,  N , 
and  P ,  respectively,  since  the  maps  at  either  end  are  0  maps.  If  we  think  of  (f  as 
an  inclusion  map,  then  the  short  exact  sequence  corresponds  to  the  isomorphism 
N/M  =  P  obtained  because  i//  factors  through  to  the  quotient  N/M. 

Proposition  10.25.  Let  R  be  a  ring  with  identity,  let 
0  — >  M  N  P  — »  0 

be  a  short  exact  sequence  in  the  category  Cr ,  let  E  be  a  module  in  Cr,  and  let  E' 
be  a  module  in  Vr.  Then  the  following  sequences  in  Ci  are  exact: 
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E'  ®R  M  E'  ®R  N  E'  ®R  P - >  0, 

0 - *  Homs(£,  M)  H°m(1,y)>  Homs(£,  N)  H°m<1,l/f)>  HomR(£,  P ), 

Hom^fM,  E)  <  Hom(y,1)  Hom^(A^,  E)  <  Hom(v!f’1)  Hom^(P,  E)  < - 0. 

Remarks.  Similarly  tensor  product  in  the  first  variable,  which  carries  VR  to 
Ci,  retains  the  same  exactness  as  in  the  first  of  these  three  sequences.  In  each 
case  when  we  specialize  to  R  =  Z,  there  are  examples  to  show  that  exactness 
fails  if  we  try  to  include  the  expected  remaining  0  in  the  above  three  sequences. 
We  give  such  examples  after  the  proof  of  the  proposition. 

PROOF.  For  the  first  sequence  in  Cz,  we  are  to  show  that  I  <g>  \l/  i  s  onto  E'  ®RP 
and  that  every  member  of  the  kernel  of  1  <8>  t/t  is  in  the  image  of  1  <g>  <P-  (Recall  that 
ker(l  <g )  V'1)  —  iniagel  I  <g>  <p)  since  the  sequence  is  a  automatically  a  complex.) 

Thus  let  p  e  P  be  given.  Since  xjr  :  N  —*■  P  is  onto,  choose  n  e  N  with 
x/f  («)  =  p.  Then  (1  <g>  ifr )(e <S> n)  =  e<E> p.  The  elements  e®  p  generate  E'  ®R  P 
as  an  abelian  group,  and  hence  1  <g>  xj/  is  onto  E'  ®R  P. 

To  show  that  ker(l  ®  \jr)  c  image(l  ®  <p),  we  observe  from  the  exactness 
of  the  given  sequence  at  N  that  E'  ®R  keri fr  =  E'  ®R  images  is  generated 
by  all  elements  e  <g>  hence  by  all  elements  (1  <g>  <p)(e  <g>  m).  Therefore 

E'  ®R  images  =  image(l  <g>  cp),  and  it  is  enough  to  prove  that 

kerf  1  ®\js)<^E'  ®R  ker  i fr.  (*) 

To  prove  (*),  we  use  the  fact  that  \fi  is  onto  P .  Define  W  =  E'  ®R  ker  t/t  as  a 
subgroup  of  E'  ®R  N,  and  let  q  :  E'  ®R  N  ( E '  ®R  N)/W  be  the  quotient 

homomorphism.  Define  b  :  E'  x  P  ( E'  ®R  N)/W  by 

b(e,  p)  =  (e  ®  n )  +  W,  where  n  is  chosen  such  that  r/r  (77)  =  p. 

The  expression  b(e,  p)  does  not  depend  on  the  choice  of  the  element  n  having 
xfr(n)  =  p  since  another  choice  n  will  differ  from  n  by  a  member  of  ker  i/z  and  will 
affect  the  definition  only  by  a  member  of  W.  The  function  b  is  certainly  additive 
in  each  variable,  and  it  evidently  has  b(er,  p)  =  h(e.  rp)  for/'  e  R  as  well.  Thus 
b  is  R  bilinear.  Let  L  :  E'  ®R  P  — »■  (£'  ®R  N)/W  be  the  additive  extension. 
From  b(e ,  ^(n))  =  {e®n)  +  W ,  we  see  that  L{e  ®^jr{n))  =  ( e®n)  +  W ,  hence 
that  L  o  (1  <g>  ifr)  =  q.  This  formula  shows  that  kerfl  ®  xj/)  c  ker q  =  W,  and 
this  is  the  inclusion  (*). 

For  the  second  sequence  in  Cz,  we  are  to  show  that  Hom(l,  <p)  is  one-one  and 
that  every  member  of  the  kernel  of  Hom(  1 .  1//)  is  in  the  image  of  Hom(  1  ,<p).  If 
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a  is  in  Homs(£\  M)  with  Hom(l,  <p)(cr )  =  0,  then  xp{a{e ))  =  0  for  all  e  e  E. 
Since  <p  is  one-one,  a(e)  =  0  for  all  e,  and  a  =  0. 

If  r  in  HomK(£\  N)  is  in  the  kernel  of  Hom(l,  x //),  so  that  xjx{x{e))  =  0  for  all 
e  e  E,  then  r(e)  =  cp ( m )  for  some  m  €  M  depending  on  e,  by  exactness  of  the 
given  sequence  at  N:  the  element  m  is  unique  because  (p  is  one-one.  Define  x' 
in  Hom^(£,  M)  by  x'(e)  =  this  m;  the  uniqueness  of  m  for  each  e  ensures  that 
x'  is  in  Hom^(£,  M).  Then  we  have  x{e)  =  (p{m)  =  <p( x'(e)),  and  we  conclude 
that  r  =  Hom(l,  xp){ x1). 

For  the  third  sequence  in  Cz,  we  are  to  show  that  I  lomti//,  1)  is  one-one  and 
that  every  member  of  the  kernel  of  Hom(<p,  1)  is  in  the  image  of  Hom(i//.  1).  If 
a  is  in  Hom^F,  E)  with  Hom(i/f,  l)(cr)  =  0,  then  cr(x//(n))  =  0  for  all  n  in  N . 
Since  i jr  carries  N  onto  P,cr  =0. 

If  r  in  Hom^CAi,  E)  is  in  the  kernel  of  Hom(<p,  1),  then  Hom(<p,  l)(r)  =  0. 
So  x{xp(m))  =  0  for  all  m  e  M.  Thus  r  vanishes  on  images  =  ker  xjr,  and  r 
descends  to  an  R  homomorphism  x  :  /V/  ker  t//  — >  E.  That  is,  r  is  of  the  form 
r  =  xx jr  =  Hom(f,  l)(r).  □ 

Examples  of  failure  of  exactness  in  Cz-  We  start  from  the  exact  sequence 
0  — Z/2Z  — >  0, 

where  <p  is  multiplication  by  2  and  xjr  is  the  usual  quotient  homomorphism. 

(1)  We  apply  Z/2Z  ®z  ( ■ )  to  the  given  exact  sequence,  and  the  claim  is  that 
1  %<p  :  (Z/2Z®zZ)  — >■  (Z/2Z(8>zZ)  is  not  one-one.  In  fact,  Z/2Z<8>zZ  =  Z/2Z, 
and  1  ®<p  acts  as  multiplication  by  2  under  the  isomorphism,  hence  is  the  0  map 
and  is  not  one-one. 

(2)  We  apply  Homz(Z/2Z,  • )  to  the  given  exact  sequence,  and  the  claim  is 
that  Hom(l,  xjr)  :  Homz(Z/2Z,  Z)  — »■  Homz(Z/2Z,  Z/2Z)  is  not  onto.  In  fact, 
Homz(Z/2Z,  Z)  =  0,  and  the  identity  map  in  Homz(Z/2Z,  Z/2Z)  is  nonzero; 
therefore  Hom(l,  xfr)  cannot  be  onto. 

(3)  We  apply  Horn-?!  • ,  Z/2Z))  to  the  given  exact  sequence,  and  the  claim 
is  that  Hom(<p,  1)  ;  Homz(Z,  Z/2Z)  — >  Homz(Z,  Z/2Z)  is  not  onto.  In  fact, 
Horn (<p,  1)  is  premultiplication  by  2  and  carries  any  a  in  I  Iomj?(Z,  Z/2Z)  to  the 
homomorphism  k  i-a  a {2k)  =  2cj(k)  =  0.  Since  the  usual  quotient  homomor¬ 
phism  Z  — >  Z/2Z  is  a  nonzero  member  of  Hom^fZ,  Z/2Z),  Horn (<p,  1)  is  not 
onto  Homz(Z,  Z/2Z). 


7.  Problems 

1.  Suppose  that  the  commutative  ring  R  is  an  integral  domain.  As  usual,  the  R 
submodules  of  R  are  the  ideals.  Prove  that  the  ideals  satisfy  the  descending 
chain  condition  if  and  only  if  I?  is  a  field. 
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2.  Let  F  =  F2  be  a  field  with  two  elements. 

(a)  Give  an  example  of  a  representation  of  the  cyclic  group  C2  on  F2  with  the 
property  that  there  is  a  1 -dimensional  invariant  subspace  U  but  there  is  no 
invariant  subspace  V  with  F2  =  U  ®  V . 

(b)  How  can  one  conclude  from  (a)  that  the  group  algebra  R  =  F  C2  has  a  unital 
left  R  module  of  finite  length  that  is  not  semisimple?  (Educational  note: 
Compare  this  conclusion  with  Example  5  in  Section  1,  which  shows  that 
every  unital  left  C G  module  is  semisimple  if  G  is  a  finite  group.) 

3.  Let  G  be  the  abelian  group  (Z/fcZ)  (Z//Z),  where  k  and  /  are  nonzero 

integers. 

(a)  Prove  that  G  is  generated  by  the  element  1  ®  1. 

(b)  Prove  that  if  k  divides  /.  then  (Z/&Z)  (Z//Z)  =  ( Z/fcZ )  ®z  (Z/fcZ). 

(c)  Using  multiplication  as  a  Z  bilinear  form  on  (Z/&Z)  x  (Z/&Z),  prove  that 

(Z/&Z)  (Z/&Z)  has  at  least  \k\  elements. 

(d)  Conclude  that  (Z/&Z)  (Z//Z)  =  Z/rfZ,  where  d  =  GCD(k,  /). 

4.  (Fitting’s  Lemma)  Let  R  be  a  ring  with  identity,  let  M  be  a  unital  left  R  module, 

and  suppose  that  M  has  a  composition  series.  Let  (p  be  a  member  of  End K (M). 

(a)  Prove  for  the  composition  powers  <pn  of  <p  that  there  exists  an  integer  N  such 
that  ker^"  =  ker (pn+l  and  images"  =  image q>'l+l  for  all  n  >  N . 

(b)  Let  /C  and  I  he  the  respective  R  submodules  of  M  obtained  for  n  >  N  in 
(a).  Prove  that  K,  Pi  1  —  0. 

(c)  For  x  in  M,  show  that  there  is  some  y  in  image  (pN  with  cpN (x)  =  <pN  (y). 

(d)  Deduce  from  (c)  that  M  =  JC  + 1,  and  conclude  from  (b)  that  M  =  1C  ©  Z. 

(e)  Prove  that  cp  carries  X  one-one  onto  X  and  that  (up  |  K)n  =0  for  some  n . 

5.  Let  R  be  a  ring  with  identity,  and  let 


be  an  exact  sequence  of  unital  left  R  modules.  Prove  that  the  following  conditions 
are  equivalent: 

(i)  A?  is  a  direct  sum  N'  ©  ker  i/r  of  R  submodules  for  some  N' , 

(ii)  there  exists  an  R  homomorphism  a  :  P  — »•  N  such  that  ij/n  —  1  />, 

(iii)  there  exists  an  R  homomorphism  r  :  N  — »•  M  such  that  rep  —  1  m- 
(Educational  note:  In  this  case  one  says  that  the  exact  sequence  is  split.) 

6.  (a)  If  R  is  the  ring  of  quaternions,  prove  that  End g  ( R )  is  isomorphic  to  R  as  a 

ring. 

(b)  Give  an  example  of  a  noncommutative  ring  with  identity  for  which  End/;  (R) 
is  not  isomorphic  to  R ,  and  explain  why  it  is  not  isomorphic. 
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7.  Let  R  be  a  ring  with  identity,  and  let  M  be  a  unital  left  R  module.  Prove  that  M 
has  a  unique  maximal  semisimple  R  submodule  N.  (Educational  note:  The  R 
submodule  N  is  called  the  socle  of  M.) 

8.  Let  F  c  K  be  an  inclusion  of  fields,  and  let  A  be  an  associative  algebra  with 

identity  over  F.  Proposition  10.24  makes  A  K  into  an  associative  algebra 
over  F  with  a  multiplication  such  that  (ai  ®  k\)(a.2  0  £2)  =  ®  k\k,2.  Show 

that  A  K  is  in  fact  an  associative  algebra  over  K  with  scalar  multiplication 
by  k  in  K  equal  to  left  multiplication  by  1  ®  k. 

9.  A  Lie  algebra  g  over  a  field  K  is  defined,  according  to  Problems  31-35  at  the 
end  of  Chapter  VI,  to  be  a  nonassociative  algebra  over  X  with  a  multiplication 
written  [x,  y]  that  is  alternating  as  a  function  of  the  pair  (x,  y)  and  satisfies 
[x,  [y,  z]]  +  [y,  [z,  x]]  +  [z,  [x,  y]]  =  0  for  all  x,  y,  z  in  g.  If  L  is  a  field 
containing  K,  prove  that  gL  =  g  ®k  L  becomes  a  Lie  algebra  over  L  in  a  unique 
way  such  that  its  multiplication  satisfies  [x  ®  c,  y  ®  d]  —  [x,  v]  ®  cd  for  x,  y  in 
g  and  c,  d  in  L. 

10.  Let  R  be  a  ring  with  identity,  let  A  be  a  unital  right  R  module,  and  let  B  be  a  unital 
left  R  module.  Since  Z  C  R,  A  and  B  can  be  considered  also  as  Z  modules.  Form 
a  version  of  A  ®r  B  with  associated  R  bilinear  map  b\  :  A  x  B  — >  A  <S)r  B.  and 
form  a  version  of  A  B  with  associated  Z  bilinear  map  b2  :  A  x  B  — >  A  B. 
Let  H  be  the  subgroup  of  A  B  generated  by  all  elements  /;?  (ar,  b)  —  /;?(«,  rb) 
with  a  e  A,  b  e  B,  r  e  R,  and  let  q  :  A  <S>z  B  —y  (A  B)/H  be  the 
quotient  homomorphism.  Prove  that  there  is  an  abelian  group  isomorphism 
d5  :  (A®z  B)/H  —r  A®r  B  such  that  d>(g(b2(fl,  b)))  —  £>i(a,  b)  for  all  a  €  A 
and  b  e  B. 

11.  Let  R  be  a  commutative  ring  with  identity,  and  let  C  be  the  category  of  all 
commutative  associative  R  algebras  with  identity.  Prove  that  if  A 1  and  A2  are  in 
Obj(C),  then  (Ai  A2,  {ii,  ('2})  is  a  coproduct,  where  ;'i  :  A\  — >•  A\  A2  is 
given  by  i\{a\)  =  a\  ®  1  and  i2  :  A2  ->  Aj  A2  is  given  by  i2(a2 )  =  1  ®  a2- 

Problems  12-20  partition  simple  left  R  modules  into  isomorphism  types,  where  R 
is  a  ring  with  identity.  For  each  simple  left  R  module  E  and  each  unital  left  R 
module  M,  one  forms  the  sum  Mr  of  all  simple  R  submodules  that  are  isomorphic 
to  E  and  calls  it  an  isotypic  R  submodule  of  M.  The  problems  introduce  a  calculus 
for  working  with  the  members  of  EwAr(Me)  in  terms  of  right  vector  spaces  over  a 
certain  division  ring.  They  show  that  if  M  is  semisimple,  then  M  is  the  direct  sum  of 
all  its  isotypic  R  submodules,  each  of  these  is  mapped  to  itself  by  every  member  of 
Endfl(M),  and  consequently  one  can  understand  End r(M)  in  terms  of  right  vector 
spaces  over  certain  division  rings.  These  problems  generalize  and  extend  Problems 
47-52  at  the  end  of  Chapter  VII,  which  in  effect  deal  with  what  happens  for  the  ring 
C  G  when  G  is  a  finite  group;  however,  the  material  of  Chapter  VII  is  not  prerequisite 
for  these  problems.  The  following  notation  is  in  force:  M  is  any  unital  left  R  module. 
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E  is  a  simple  left  R  module,  De  =  Horn  r  ( E ,  E)  is  the  ring  known  from  Proposition 
10.4b  to  be  a  division  ring. 

Me  =  (sum  of  all  R  submodules  of  M  that  are  R  isomorphic  to  E). 
and  Me  =  Homjj(£,  M). 

Unital  right  De  modules  are  right  vector  spaces  over  De-  In  Problems  18-20,  £ 
denotes  a  set  of  representatives  of  all  R  isomorphism  classes  of  simple  left  R  modules. 

12.  Prove  that 

(a)  Me  is  a  direct  sum  of  simple  R  modules  that  are  R  isomorphic  to  E, 

(b)  the  image  of  every  mapping  in  ME  belongs  to  Me, 

(c)  redefinition  of  the  range  from  M  to  Me  defines  an  isomorphism  ME  = 
Hom^(£,  Me)  of  abelian  groups. 

13.  Prove  that 

(a)  Me  is  a  unital  right  D /  module  under  composition  of  R  homomorphisms, 

(b)  £  is  a  unital  left  De  module  under  the  operation  of  the  members  of  De, 

(c)  the  left  R  module  action  and  the  left  De  module  action  on  E  commute  with 
each  other. 

14.  Show  that  ME  ®de  £  is  a  unital  left  R  module  in  such  a  way  that  r(m  ®  e)  — 
m  ®  re. 

15.  Prove  that  there  is  a  well-defined  R  homomorphism  <t>  :  ME  ®de  E  — >  M  such 
that  ®  e)  —  ir(e)  and  such  that  <1>  is  an  R  isomorphism  onto  Me- 

16.  Prove  that  the  left  R  submodules  N  of  M /.  are  in  one-one  correspondence  with 
the  right  De  vector  subspaces  W  of  ME  by  the  maps 

N  i — r  Hom^(£,  N)  c  Homfi(£,  M )  —  ME  if  N  C  Me 

and  W  \-+  W  ®De  E  <z  Me  ®De  E  =  Me  \fW<ZME. 

17.  Prove  for  any  unital  left  R  module  N  that  there  is  a  canonical  isomorphism 

Hom/jf Me,  Ne)  =  Hom£)£  (ME ,  NE) 

of  abelian  groups  defined  as  follows.  Suppose  (p  is  in  Horn r(Me,  Ne)-  Com¬ 
position  with  cp  carries  Horn r(E ,  M)  into  Horn r( E ,  N)\  this  map  respects  the 
right  action  of  De  and  hence  induces  a  map 

cpE  e  Horn de(Me ,  NE). 

The  isomorphism  is  given  in  terms  of  the  isomorphisms  <&m  for  M  and  <J>  ,v  for 
N  in  Problem  15  by 

<P(Om(iA  ®  e))  —  3>^(^£(i/f)  <E>  e)  for  x// e  ME . 
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18.  If  M  is  semisimple,  prove  that 

M  =  0  =  0  (M£  ®De  E). 

Es£  Ee£ 

19.  Still  with  M  semisimple,  prove  that  the  left  R  submodules  of  M  are  in  one-one 
correspondence  with  families  {WE  \  E  e  £}  of  right  /> /.  vector  subspaces  of 
Me. 

20.  Suppose  that  M  and  N  are  two  semisimple  left  R  modules.  Prove  that  there  is  a 
canonical  isomorphism  of  abelian  groups 

Hom«(M,  lV)  =  n  Horn de(Me ,  NE). 

E  s£ 

More  precisely  prove  that  an  R  module  map  from  M  to  N  is  specified  by  giving, 
for  a  representative  E  of  each  class  of  simple  left  R  modules,  an  arbitrary  right 
vector-space  map  from  ME  to  NE . 


APPENDIX 


Abstract.  This  appendix  treats  some  topics  that  are  likely  to  be  well  known  by  some  readers  and  less 
known  by  others.  Most  of  it  already  comes  into  play  by  Chapter  II.  Section  A1  deals  with  set  theory 
and  with  functions:  it  discusses  the  role  of  formal  set  theory,  it  works  in  a  simplified  framework  that 
avoids  too  much  formalism  and  the  standard  pitfalls,  it  establishes  notation,  and  it  mentions  some 
formulas.  Some  emphasis  is  put  on  distinguishing  the  image  and  the  range  of  a  function,  since  this 
distinction  is  important  in  algebra  and  algebraic  topology. 

Section  A2  defines  equivalence  relations  and  establishes  the  basic  fact  that  they  lead  to  a  parti¬ 
tioning  of  the  underlying  set  into  equivalence  classes. 

Section  A3  reviews  the  construction  of  rational  numbers  from  the  integers,  and  real  numbers 
from  the  rational  numbers.  From  there  it  concentrates  on  the  solvability  within  the  real  numbers  of 
certain  polynomial  equations. 

Section  A4  is  a  quick  review  of  complex  numbers,  real  and  imaginary  parts,  complex  conjugation, 
and  absolute  value. 

Sections  A5  and  A6  return  to  set  theory.  Section  A5  defines  partial  orderings  and  includes  Zorn's 
Lemma,  which  is  a  powerful  version  of  the  Axiom  of  Choice,  while  Section  A6  concerns  cardinality. 


Al.  Sets  and  Functions 

Algebra  typically  makes  use  of  an  informal  notion  of  set  theory  and  notation  for 
it  in  which  sets  are  described  by  properties  of  their  elements  and  by  operations 
on  sets.  This  informal  set  theory,  if  allowed  to  be  too  informal,  runs  into  certain 
paradoxes,  such  as  Russell’s  paradox:  “If  S  is  the  set  of  all  sets  that  do  not 
contain  themselves  as  elements,  is  S  a  member  of  S  or  is  it  not?”  The  conclusion 
of  Russell’s  paradox  is  that  the  “set”  of  all  sets  that  do  not  contain  themselves  as 
elements  is  not  in  fact  a  set. 

Mathematicians’  experience  is  that  such  pitfalls  can  be  avoided  completely  by 
working  within  some  formal  axiom  system  for  sets,  of  which  there  are  several 
that  are  well  established.  A  basic  one  is  “Zermelo-Fraenkel  set  theory,"  and  the 
remarks  in  this  section  refer  specifically  to  it  but  refer  to  the  others  at  least  to 
some  extent.1 

The  standard  logical  paradoxes  are  avoided  by  having  sets,  elements  (or  “en¬ 
tities”),  and  a  membership  relation  e  such  that  a  e  S  is  a  meaningful  statement, 

1  Mathematicians  have  no  proof  that  this  technique  avoids  problems  completely.  Such  a  proof 
would  be  a  proof  of  the  consistency  of  a  version  of  mathematics  in  which  one  can  construct  the 
integers,  and  it  is  known  that  this  much  of  mathematics  cannot  be  proved  to  be  consistent  unless  it 
is  in  fact  inconsistent. 
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true  or  false,  if  and  only  if  a  is  an  element  and  S  is  a  set.  The  terms  set,  element, 
and  e  are  taken  to  be  primitive  terms  of  the  theory  that  are  in  effect  defined  by 
a  system  of  axioms.  The  axioms  ensure  the  existence  of  many  sets,  including 
infinite  sets,  and  operations  on  sets  that  lead  to  other  sets.  To  make  full  use  of  this 
axiom  system,  one  has  to  regard  it  as  occurring  in  the  framework  of  certain  rules 
of  logic  that  tell  the  forms  of  basic  statements  (namely,  a  =  b,  a  e  S,  and  “S 
is  a  set”),  the  connectives  for  creating  complicated  statements  from  simple  ones 
(“or,”  “and,”  “not,”  and  “if  . . .  then”),  and  the  way  that  quantifiers  work  (“there 
exists”  and  “for  all”). 

Working  rigorously  with  such  a  system  would  likely  make  the  development 
of  mathematics  unwieldy,  and  it  might  well  obscure  important  patterns  and  di¬ 
rections.  In  practice,  therefore,  one  compromises  between  using  a  formal  axiom 
system  and  working  totally  informally;  let  us  say  that  one  works  “informally  but 
carefully.”  The  logical  problems  are  avoided  not  by  rigid  use  of  an  axiom  system, 
but  by  taking  care  that  sets  do  not  become  too  “large”:  one  limits  the  sets  that  one 
uses  to  those  obtained  from  other  sets  by  set-theoretic  operations  and  by  passage 
to  subsets.2 

A  feature  of  the  axiom  system  lying  behind  working  informally  but  carefully 
is  that  it  does  not  preclude  the  existence  of  additional  sets  beyond  those  forced  to 
exist  by  the  axioms.  Thus,  for  example,  in  the  subject  of  coin-tossing  within  prob¬ 
ability,  it  is  normal  to  work  with  the  set  of  possible  outcomes  as  S  =  {heads,  tails} 
even  though  it  is  not  immediately  apparent  that  requiring  this  S  to  be  a  set  does 
not  introduce  some  contradiction. 

It  is  worth  emphasizing  that  the  points  of  the  theory  at  which  one  takes  particu¬ 
lar  care  vary  somewhat  from  subject  to  subject  within  mathematics.  For  example 
it  is  sometimes  of  interest  in  calculus  of  several  variables  to  distinguish  between 
the  range  of  a  function  and  its  image  in  a  way  that  will  be  mentioned  below,  but  it 
is  usually  not  too  important.  In  homological  algebra,  however,  the  distinction  is 
extremely  important,  and  the  subject  loses  a  great  deal  of  its  impact  if  one  blurs 
the  notions  of  range  and  image. 

Some  references  for  set  theory  that  are  appropriate  for  reading  once  are 
Halmos’s  Naive  Set  Theory ,  Hayden-Kennison’s  Zennelo-Fraenkel  Set  Theory, 
and  Chapter  0  and  the  appendix  of  Kelley’s  General  Topology.  The  Kelley  book 
is  one  that  uses  the  word  “class”  as  a  primitive  term  more  general  than  “set”;  it 
develops  von  Neumann  set  theory. 

All  that  being  said,  let  us  now  introduce  the  familiar  terms,  constructions, 
and  notation  that  one  associates  with  set  theory.  To  cut  down  on  repetition,  one 

2Not  every  set  so  obtained  is  to  be  regarded  as  "constructed.”  The  Axiom  of  Choice,  which  we 
come  to  shortly,  is  an  existence  statement  for  elements  in  products  of  sets,  and  the  result  of  applying 
the  axiom  is  a  set  that  can  hardly  be  viewed  as  “constructed.” 
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allows  some  alternative  words  for  “set,”  such  as  family  and  collection.  The  word 
“class”  is  used  by  some  authors  as  a  synonym  for  “set,”  but  the  word  class  is  used 
in  some  set-theory  axiom  systems  to  refer  to  a  more  general  notion  than  “set,” 
and  it  will  be  useful  to  preserve  this  possibility.  Thus  a  class  can  be  a  set,  but  we 
allow  ourselves  to  speak,  for  example,  of  the  class  of  all  groups  even  though  this 
class  is  too  large  to  be  a  set.  Alternative  terms  for  “element”  are  member  and 
point;  we  shall  not  use  the  term  “entity.”  Instead  of  writing  e  systematically,  we 
allow  ourselves  to  write  “in.”  Generally,  we  do  not  use  e  in  sentences  of  text  as 
an  abbreviation  for  an  expression  like  “is  in”  that  contains  a  verb. 

If  A  and  B  are  two  sets,  some  familiar  operations  on  them  are  the  union  AU  B. 
the  intersection  A0B,  and  the  difference  A  —  B,  all  defined  in  the  usual  way  in 
terms  of  the  elements  they  contain.  Notation  for  the  difference  of  sets  varies  from 
author  to  author;  some  other  authors  write  A  \  B  or  A  ~  B  for  difference,  but  this 
book  uses  A  —  B.  If  one  is  thinking  of  A  as  a  universe,  one  may  abbreviate  A  —  B 
as  Bc,  the  complement  of  B  in  A.  The  empty  set  0  is  a  set,  and  so  is  the  set  of 
all  subsets  of  a  set  A,  which  is  sometimes  denoted  by  2A.  Inclusion  of  a  subset  A 
in  a  set  B  is  written  Ac  B  or  B  c  A;  then  B  is  a  superset  of  A.  Inclusion  that 
does  not  permit  equality  is  denoted  by  A  C  g  or  B  ^  A;  in  this  case  one  says 
that  A  is  a  proper  subset  of  B  or  that  A  is  properly  contained  in  B. 

If  A  is  a  set,  the  singleton  {A}  is  a  set  with  just  the  one  member  A.  Another 
operation  is  unordered  pair,  whose  formal  definition  is  {A,  B]  =  {A}  U  {B}  and 
whose  informal  meaning  is  a  set  of  two  elements  in  which  we  cannot  distinguish 
either  element  over  the  other.  Still  another  operation  is  ordered  pair,  whose 
formal  definition  is  (A,  B)  =  {{A},  {A,  B}}.  It  is  customary  to  think  of  an 
ordered  pair  as  a  set  with  two  elements  in  which  one  of  the  elements  can  be 
distinguished  as  coming  first.3 

Let  A  and  B  be  two  sets.  The  set  of  all  ordered  pairs  of  an  element  of  A  and 
an  element  of  B  is  a  set  denoted  by  A  x  B ;  it  is  called  the  product  of  A  and  B 
or  the  Cartesian  product.  A  relation  between  a  set  A  and  a  set  B  is  a  subset  of 
Ax  B.  Functions,  which  are  to  be  defined  in  a  moment,  provide  examples.  Two 
examples  of  relations  that  are  usually  not  functions  are  “equivalence  relations,” 
which  are  discussed  in  Section  A2,  and  “partial  orderings,”  which  are  discussed 
in  Section  A5. 

If  A  and  B  are  sets,  a  relation  /  between  A  and  B  is  said  to  be  a  function, 
written  /  :  A  — »■  B,  if  for  each  x  e  A,  there  is  exactly  one  y  e  B  such  that 
(x,  y)  is  in  /.  If  (x,  y)  is  in  /,  we  write  /(x)  =  y.  In  this  informal  but  careful 
definition  of  function,  the  function  consists  of  more  than  just  a  set  of  ordered 


3 Unfortunately  a  “sequence”  gets  denoted  by  {jci  ,  X2, ... }  or  {x„}T_j.  If  its  notation  were  really 
consistent  with  the  above  definitions,  we  might  infer,  inaccurately,  that  the  order  of  the  terms  of 
the  sequence  does  not  matter.  The  notation  for  unordered  pairs,  ordered  pairs,  and  sequences  is, 
however,  traditional,  and  it  will  not  be  changed  here. 
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pairs;  it  consists  of  the  set  of  ordered  pairs  regarded  as  a  subset  of  A  x  fi.  This 
careful  definition  makes  it  meaningful  to  say  that  the  set  A  is  the  domain,  the  set 
B  is  the  range,4  and  the  subset  of  y  e  B  such  that  y  =  fix)  for  some  x  €  A  is 
the  image  of  /.  The  image  is  also  denoted  by  /(A).  Sometimes  a  function  / 
is  described  in  terms  of  what  happens  to  typical  elements,  and  then  the  notation 
is  j  i->  f (x )  or  r  i->  y,  possibly  with  y  given  by  some  formula  or  by  some 
description  in  words  about  how  it  is  obtained  from  x.  Sometimes  a  function  /  is 
written  as  /( • ),  with  a  dot  indicating  the  placement  of  the  variable;  this  notation 
is  especially  helpful  in  working  with  restrictions,  which  we  come  to  in  a  moment, 
and  with  functions  of  two  variables  when  one  of  the  variables  is  held  fixed.  This 
notation  is  useful  also  for  functions  that  involve  unusual  symbols,  such  as  the 
absolute  value  function  x  i— >  \x\,  which  in  this  notation  becomes  |  •  |.  The 
word  map  or  mapping  is  used  for  “function”  and  for  the  operation  of  a  function, 
especially  when  a  geometric  setting  for  the  function  is  of  importance. 

Often  mathematicians  are  not  so  careful  with  the  definition  of  function.  De¬ 
pending  on  the  degree  of  informality  that  is  allowed,  one  may  occasionally  refer 
to  a  function  as  fix)  when  it  should  be  called  /  or  x  i->  fix).  If  any  confusion  is 
possible,  it  is  wise  to  use  the  more  rigorous  notation.  Another  habit  of  informality 
is  to  regard  a  function  /  :  A  — >■  fi  as  simply  a  set  of  ordered  pairs.  Thus  two 
functions  fi  :  A  — >  B  and  />  ;  A  — >  C  become  the  same  if  f  \  (a)  =  /2(a)  for 
all  a  in  A.  With  the  less-careful  definition,  the  notion  of  the  range  of  a  function  is 
not  really  well  defined.  The  less-careful  definition  can  lead  to  trouble  in  algebra 
and  topology,  but  it  does  not  often  lead  to  trouble  in  analysis  until  one  gets  to 
a  level  where  algebra  and  analysis  merge  somewhat.  One  place  where  it  comes 
into  play  in  algebra  is  in  the  notion  of  an  exact  sequence  of  three  abelian  groups 

A  — B  -—X  C,  which  is  defined  as  a  system  of  three  abelian  groups  and 
homomorphisms  as  indicated  such  that  the  kernel  of  1//  equals  the  image  of  ip.  In 
this  definition  one  is  not  free  to  adjust  B  to  be  the  image  of  cp  since  that  adjustment 
will  affect  the  kernel  of  1 {r  as  well. 

The  set  of  all  functions  from  a  set  A  to  a  set  fi  is  a  set.  It  is  sometimes  denoted 
by  fi4.  The  special  case  2  4  that  arises  with  subsets  comes  by  regarding  2  as  a 
set  {1,  2}  and  identifying  a  function  /  from  A  into  {1,2}  with  the  subset  of  all 
elements  x  of  A  for  which  fix)  =  1. 

If  a  subset  fi  of  a  set  A  may  be  described  by  some  distinguishing  property 
P  of  its  elements,  we  may  write  this  relationship  as  fi  =  {x  e  A  \  P}.  For 
example  the  function  /  in  the  previous  paragraph  is  identified  with  the  subset 
{x  e  A  [  f(x)  =  1}.  Another  example  is  the  image  of  a  general  function 
/  ;  A  — »■  fi,  namely  /(A)  =  {y  e  B  \  y  =  /(x)  for  some  x  e  A}.  Still  more 
generally  along  these  lines,  if  E  is  any  subset  of  A,  then  f[E )  denotes  the  set 


4Some  authors  refer  to  B  as  the  codomain. 
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{ v  e  B  |  y  =  f(x )  for  some  jc  e  £}.  Some  authors  use  a  colon  or  semicolon  or 
comma  instead  of  a  vertical  line  in  this  notation. 

This  book  frequently  uses  sets  denoted  by  expressions  like  [J  s  Ax,  an  in¬ 
dexed  union,  where  S  is  a  set  that  is  usually  nonempty.  If  S  is  the  set  {1,2},  this 
reduces  to  Ai  U  A2.  In  the  general  case  it  is  understood  that  we  have  an  unnamed 
function,  say  /,  given  by  x  h->  Ax,  having  domain  S  and  range  the  set  of  all 
subsets  of  an  unnamed  set  T,  and  [J  veS  Ax  is  the  set  of  all  y  e  T  such  that  y  is 
in  A_v  for  some  x  e  S.  When  S  is  understood,  we  may  write  [J,  Ax  instead  of 
U xes  A*-  Indexed  intersections  p|reS  Ax  are  defined  similarly,  and  this  time  it  is 
essential  to  disallow  S  empty  because  otherwise  the  intersection  cannot  be  a  set 
in  any  useful  set  theory. 

There  is  also  an  indexed  Cartesian  product  X  xesAx  t*lat  specializes  in  the 
case  that  S  =  {1,  2}  to  A\  x  A?.  Usually  S  is  assumed  nonempty.  This  Cartesian 
product  is  the  set  of  all  functions  /  from  S  into  {J  s  Ax  such  that  fix)  is  in 
Ax  for  all  x  €  S.  In  the  special  case  that  S  is  {1, . . . ,  n},  the  Cartesian  product 
is  the  set  of  ordered  n -tuples  from  n  sets  A\, . . . ,  An  and  may  be  denoted  by 
Ai  x  •  •  •  x  A„;  its  members  may  be  denoted  by  (a\, . . . ,  an)  with  aj  €  A,  for 
!<./'<  n.  When  the  factors  of  a  Cartesian  product  have  some  additional 
algebraic  structure,  the  notation  for  the  Cartesian  product  is  often  altered;  for 
example  the  Cartesian  product  of  groups  Ax  is  denoted  by  ]^ve  S  A_v. 

It  is  completely  normal  in  algebra,  and  it  is  the  practice  in  this  book,  to  take 
the  following  axiom  as  part  of  one’s  set  theory;  the  axiom  is  customarily  used 
without  specific  mention. 

Axiom  of  Choice.  The  Cartesian  product  of  nonempty  sets  is  nonempty. 

If  the  index  set  is  finite,  then  the  Axiom  of  Choice  reduces  to  a  theorem  of  set 
theory.  The  axiom  is  often  used  quite  innocently  with  a  countably  infinite  index 
set.  For  example  a  theorem  of  analysis  asserts  that  any  bounded  sequence  {an }  of 
real  numbers  has  a  subsequence  converging  to  lim  sup  a„ ,  and  the  proof  constructs 
one  member  of  the  sequence  at  a  time.  When  the  proof  is  written  in  such  a  way  that 
these  members  have  some  flexibility  in  their  definitions,  the  Axiom  of  Choice 
is  usually  being  invoked.  The  proof  can  be  rewritten  so  that  the  members  of 
the  subsequence  have  specific  definitions,  such  as  "the  term  an  such  that  n  is  the 
smallest  integer  satisfying  such-and-such  properties.”  In  this  case  the  axiom  is  not 
being  invoked.  In  fact,  one  can  often  rewrite  proofs  involving  a  countably  infinite 
choice  so  that  they  involve  specific  definitions  and  therefore  avoid  invoking  the 
axiom,  but  there  is  no  point  in  undertaking  this  rewriting.  In  algebra  the  axiom 
is  often  invoked  in  situations  in  which  the  index  set  is  uncountable;  selection  of 
a  representative  from  each  of  uncountably  many  equivalence  classes  is  such  a 
choice  if  all  equivalence  classes  have  more  than  one  element. 
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From  the  Axiom  of  Choice,  one  can  deduce  a  powerful  tool  known  as  Zorn’s 
Lemma,  whose  use  it  is  customary  to  acknowledge.  Zorn’s  Lemma  appears  in 
Section  A5. 

If  /  :  A  — >  B  is  a  function  and  B  is  a  subset  of  B' ,  then  /  can  be  regarded 
as  a  function  with  range  B'  in  a  natural  way.  Namely,  the  set  of  ordered  pairs  is 
unchanged  but  is  to  be  regarded  as  a  subset  of  Ax  B'  rather  than  Ax  B. 

Let  /  :  A  — »  B  and  g  :  B  ^  C  be  two  functions  such  that  the  range  of  / 
equals  the  domain  of  g.  The  composition  go  f  :  A  — »■  C,  written  sometimes  as 
gf  :  A  — »■  C,  is  the  function  with  (g  o  f)(x)  =  g(f(x ))  for  all  x.  Because  of  the 
construction  in  the  previous  paragraph,  it  is  meaningful  to  deline  the  composition 
more  generally  when  the  range  of  /  is  merely  a  subset  of  the  domain  of  g. 

A  function  /  :  A  — >  B  is  said  to  be  one-one  if  f(x  1)  f  f(x 2)  whenever  x\ 
and  a'2  are  distinct  members  of  A.  The  function  is  said  to  be  onto,  or  often  “onto 
£,”  if  its  image  equals  its  range.  The  terminology  “onto  B”  avoids  confusion:  it 
specifies  the  image  and  thereby  guards  against  the  use  of  the  less  careful  definition 
of  function  mentioned  above.  A  mathematical  audience  often  contains  some 
people  who  use  the  more  careful  definition  of  function  and  some  people  who  use 
the  less  careful  definition.  For  the  latter  kind  of  person,  a  function  is  always  onto 
something,  namely  its  image,  and  a  statement  that  a  particular  function  is  onto 
might  be  regarded  as  a  tautology.  A  function  from  one  set  to  another  is  said  to 
put  the  sets  in  one-one  correspondence  if  the  function  is  one-one  and  onto. 

When  a  function  f  :  A  ^  B  is  one-one  and  is  onto  B ,  there  exists  a  function 
g  :  B  — »■  A  such  that  g  o  /  is  the  identity  function  on  A  and  f  o  g  is  the  identity 
function  on  B.  The  function  g  is  unique,  and  it  is  defined  by  the  condition,  for 
y  e  B,  that  g(y)  is  the  unique  x  e  A  with  fix)  =  y.  The  function  g  is  called 
the  inverse  function  of  /  and  is  often  denoted  by  f~l. 

Conversely  if  /  :  A  — »■  B  has  an  inverse  function,  then  /  is  one-one  and 
is  onto  B.  The  reason  is  that  a  composition  g  o  f  can  be  one-one  only  if  /  is 
one-one,  and  in  addition,  that  a  composition  fog  can  be  onto  the  range  of  / 
only  if  /  is  onto  its  range. 

If  /  :  A  — >  B  is  a  function  and  £  is  a  subset  of  A,  the  restriction  of  / 
to  E,  denoted  by  / 1  ,  is  the  function  /  :  E  — »■  B  consisting  of  all  ordered 
pairs  (x,  fix))  with  x  e  E,  this  set  being  regarded  as  a  subset  of  E  x  B,  not  of 
Ax  B.  One  especially  common  example  of  a  restriction  is  restriction  to  one  of  the 
variables  of  a  function  of  two  variables,  and  then  the  idea  of  using  a  dot  in  place 
of  a  variable  can  be  helpful  notationally.  Thus  the  function  of  two  variables  might 
be  indicated  by  /  or  (x,  y)  f(x ,  y ) ,  and  the  restriction  to  the  first  variable, 
for  fixed  value  of  the  second  variable,  would  be  /( • ,  y)  or  x  fix,  y). 

We  conclude  this  section  with  a  discussion  of  direct  and  inverse  images  of 
sets  under  functions.  If  /  :  A  — >  B  is  a  function  and  £  is  a  subset  of  A,  we 
have  defined  /(£)  =  {y  e  B  \  y  =  fix)  for  some  x  e  £}.  This  is  the  same 
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as  the  image  of  / 1  and  is  frequently  called  the  image  or  direct  image  of  E 
under  /.  The  notion  of  direct  image  does  not  behave  well  with  respect  to  some 
set-theoretic  operations:  it  respects  unions  but  not  intersections.  In  the  case  of 
unions,  we  have 

/(U£*)  =  U^>; 

the  inclusion  D  follows  since  f(  ^  'j  —  / ( Es )  for  each  s,  and  the  inclusion 

C  follows  because  any  member  of  the  left  side  is  /  of  a  member  of  some  Es .  In 

? 

the  case  of  intersections,  the  question  f(EDF)  =  f(E)C\f(F)  can  easily  have 
a  negative  answer,  the  correct  general  statement  being  f(E  D  F)  C  f(E)  fl  f(F). 
An  example  with  equality  failing  occurs  when  A  =  {1,2,  3},  B  =  {1,2},/(1)  = 
/( 3)  =  1,  /( 2)  =  2,  E  =  {1,2}  and  F  =  {2,  3}  because  f(E  fl  F)  =  {2}  and 
/ (E)  n  f (F)  =  {1,2}. 

If  /  :  A  — >  B  is  a  function  and  £  is  a  subset  of  B.  the  inverse  image  of  E 
under  /  is  the  set  f~l{E)  =  {.r  e  A  |  f(x)  e  E}.  This  is  well  defined  even  if  / 
does  not  have  an  inverse  function.  (If  /  does  have  an  inverse  function  /-1 ,  then 
the  inverse  image  of  E  under  /  coincides  with  the  direct  image  of  E  under  /-1.) 

Unlike  direct  images,  inverse  images  behave  well  under  set-theoretic  opera¬ 
tions.  If  /  :  A  — >  B  is  a  function  and  [Es  \  s  e  S]  is  a  set  of  subsets  of  B , 
then 


r1  ( n  e*) =rv-i(^)’ 

seS  seS 

r'(\JE>)  =  {Jrl(Es), 

seS 

r\Ecs)  =  {f-\Es)y. 

In  the  third  of  these  identities,  the  complement  on  the  left  side  is  taken  within 
B,  and  the  complement  on  the  right  side  is  taken  within  A.  To  prove  the 
first  identity,  we  observe  that  /“’(  Pises  ^s)  —  /_1(^«)  f°r  each  s  e  S  and 
hence  /_1(  f}veS  Es)  c  QsgS  f~x{Es).  For  the  reverse  inclusion,  if  x  is  in 
Pises  /_1  ( F-s ) •  then  x  is  in  /-1  ( Es )  for  each  s  and  thus  f(x)  is  in  Es  for  each  s. 
Hence  f(x)  is  in  ("IseS  '  anc^  x 's  'n  /_1  (  f\es  ^s)-  This  proves  the  reverse 
inclusion.  The  second  and  third  identities  are  proved  similarly. 


A2.  Equivalence  Relations 

An  equivalence  relation  on  a  set  S  is  a  relation  between  S  and  itself,  i.e.,  is  a 
subset  of  S  x  S,  satisfying  three  defining  properties.  We  use  notation  like  a  —  b, 
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written  “a  is  equivalent  to  b”  to  mean  that  the  ordered  pair  (a.  b)  is  a  member  of 
the  relation,  and  we  say  that  is  the  equivalence  relation.  The  three  defining 
properties  are 

(i)  a  —  a  for  all  a  in  S,  i.e.,  ~  is  reflexive, 

(ii)  a  —  b  implies  b  —  aifa  and  b  are  in  S,  i.e.,  ~  is  symmetric. 

(iii)  a  —  b  and  b  ~  c  together  imply  a  ~  c  if  a,  b,  and  c  are  in  S ,  i.e.,  ~  is 

transitive. 

An  example  occurs  with  S  equal  to  the  set  Z  of  integers  with  a  ~  b  meaning 
that  the  difference  a  —  b  is  even.  The  properties  hold  because  (i)  0  is  even,  (ii) 
the  negative  of  an  even  integer  is  even,  and  (iii)  the  sum  of  two  even  integers  is 
even. 

There  is  one  fundamental  result  about  abstract  equivalence  relations.  The 
equivalence  class  of  a,  written  [a]  for  now,  is  the  set  of  all  members  b  of  S  such 
that  a  —  b. 

Proposition.  If  ~  is  an  equivalence  relation  on  a  set  5,  then  any  two  equiv¬ 
alence  classes  are  disjoint  or  equal,  and  S  is  the  union  of  all  the  equivalence 
classes. 

PROOF.  Let  \a  |  and  [/?  ]  be  the  equivalence  classes  of  members  a  and  b  of  S. 
If  [o]  D  [/;]  ^  0,  choose  c  in  the  intersection.  Then  a  —  c  and  b  ~  c.  By  (ii), 
c  —  b,  and  then  by  (iii),  a  ~  b.  If  c/  is  any  member  of  \b\,  then  b  ~  d.  From 
(iii),  a  —  b  and  b  —  d  together  imply  a  —  cl.  Thus  \b\  c  [a\.  Reversing  the 
roles  of  a  and  b,  we  see  that  [a]  c  [/;  ]  also,  whence  [a]  =  [/?].  This  proves  the 
first  conclusion.  The  second  conclusion  follows  from  (i),  which  ensures  that  a  is 
in  [n],  hence  that  every  member  of  S  lies  in  some  equivalence  class.  □ 

Example.  With  the  equivalence  relation  on  Z  that  a  ~  b  if  a  —  b  is  even, 
there  are  two  equivalence  classes— the  subset  of  even  integers  and  the  subset  of 
odd  integers. 

The  first  two  examples  of  equivalence  relations  in  this  book  arise  in  Section 
II. 3.  The  first  example,  which  is  captured  in  the  definition  of  square  matrices 
that  are  “similar,”  yields  equivalence  classes  exactly  as  above.  A  square  matrix 
A  is  similar  to  a  square  matrix  B  if  there  is  a  matrix  C  with  B  =  C-1  AC.  The 
text  does  not  mention  in  Chapter  II  that  similarity  is  an  equivalence  relation,  but 
it  is  routine  to  check  that  it  is  reflexive,  symmetric,  and  transitive.  The  second 
example  is  a  relation  “is  isomorphic  to”  and  implicitly  is  defined  on  the  class  of  all 
vector  spaces.  This  class  is  not  a  set,  and  Section  A1  of  this  appendix  suggested 
avoiding  using  classes  that  are  not  sets  in  order  to  avoid  the  logical  paradoxes 
mentioned  at  the  beginning  of  the  appendix.  There  is  not  much  problem  with 
using  general  classes  in  this  particular  situation,  but  there  is  a  simple  approach 
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in  this  situation  for  eliminating  classes  that  are  not  sets  and  thereby  following 
the  suggestion  of  Section  A1  without  making  an  exception.  The  approach  is  to 
work  with  any  subclass  of  vector  spaces  that  is  a  set.  The  equivalence  relation 
is  well  defined  on  the  set  of  vector  spaces  in  question,  and  the  proposition  yields 
equivalence  classes  within  that  set.  This  set  can  be  an  arbitrary  subclass  of  the 
class  of  all  vector  spaces  that  happens  to  be  a  set,  and  the  practical  effect  is  the 
same  as  if  the  equivalence  relation  had  been  defined  on  the  class  of  all  vector 
spaces. 


A3.  Real  Numbers 

Real  numbers  are  taken  as  known,  as  are  the  rational  numbers  from  which  they 
are  constructed.  It  will  be  useful,  however,  to  review  the  constructions  of  both 
these  number  systems  so  as  to  be  able  to  discuss  the  solvability  of  polynomial 
equations  better. 

We  take  the  set  Z  of  integers  as  given,  along  with  its  ordering  and  its  operations 
of  addition,  subtraction,  and  multiplication.  The  set  Q  of  rational  numbers  is 
constructed  rigorously  from  Z  as  follows.  We  start  from  the  set  of  ordered  pairs 
(a.  b)  of  integers  such  that  b  ^  0.  The  idea  is  that  ( a ,  b)  is  to  correspond  to 
a/b  and  that  we  want  (na,  nb )  to  correspond  to  the  same  a/b  if  n  is  any  nonzero 
integer.  Thus  we  say  that  two  such  pairs  have  (a,  b)  ~  (c.  d)  if  ad  =  be. 
This  relation  is  evidently  reflexive  and  symmetric,  and  it  will  be  an  equivalence 
relation  if  it  is  transitive.  If  ( a ,  b)  ~  (c,  d)  and  (c,  d)  ~  (e,  /),  then  ad  =  be 
and  cf  =  de.  So  adf  =  bef  =  bde.  Since  d  ^  0,  af  =  be  and  ~  is  transitive. 

From  Section  A2  the  set  of  such  pairs  is  partitioned  into  equivalence  classes 
by  means  of  Each  equivalence  class  is  called  a  rational  number.  To  de¬ 
fine  the  arithmetic  operations  on  rational  numbers,  we  first  define  operations  on 
pairs,  and  then  we  check  that  the  operations  respect  the  partitioning  into  classes. 
For  addition,  the  definition  is  (a,  b)  +  (c,  d)  =  (ad  +  be,  bd).  What  needs 
checking  is  that  if  (a,  b)  ~  (a' ,  b')  and  (c,  d)  ~  (c\  d'),  then  (ad  +  be,  bd)  ~ 
(a'd'  +  b'c' ,b'd').  This  is  a  routine  matter:  (ad  +  bc)(b'd')  =  ab'dd'  +bb'cd'  = 
a'bdd'  +  bb'c'd  =  (a'd'  +  b'c')bd,  and  thus  addition  of  rational  numbers  is 
well  defined.  The  operations  on  pairs  for  negative,  multiplication,  and  reciprocal 
are  —(a,  b )  =  (—a,  b),  (a.  b)(c ,  d)  =  (ac,  bd),  and  (a,  b)~]  =  (b,  a),  and  we 
readily  check  that  these  define  corresponding  operations  on  rational  numbers. 
Finally  one  derives  the  familiar  associative,  commutative,  and  distributive  laws 
for  these  operations  on  Q. 

The  above  construction  is  repeated,  with  more  details,  in  the  more  general 
construction  of  “fields  of  fractions”  in  Chapter  VIII. 

Inequalities  on  rational  numbers  are  defined  from  inequalities  on  integers,  tak- 
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ing  into  account  that  an  inequality  between  integers  is  preserved  when  multiplied 
by  a  positive  integer.  Each  rational  number  has  a  representative  pair  ( a ,  b)  with 
b  >  0  because  any  pair  can  always  be  replaced  by  the  pair  of  negatives.  Thus 
let  {a,  b)  and  (c,  d)  be  given  with  b  >  0  and  d  >  0.  We  say  that  (a,  b )  <  (c,  d) 
if  ad  <  be.  One  readily  checks  that  this  ordering  respects  equivalence  classes 
and  leads  to  the  usual  properties  of  the  ordering  on  Q.  The  positive  rationals  are 
those  greater  than  0,  and  the  negative  rationals  are  those  less  than  0. 

The  formal  definition  is  that  a  real  number  is  a  cut  of  rational  numbers,  i.e., 
a  subset  of  rational  numbers  that  is  neither  Q  nor  the  empty  set,  has  no  largest 
element,  and  contains  all  rational  numbers  less  than  any  rational  that  it  contains. 
The  set  of  cuts,  i.e.,  the  set  of  real  numbers,  is  denoted  by  R.  The  idea  of  the 
construction  is  as  follows:  Each  rational  number  q  determines  a  cut  q " ,  namely 
the  set  of  all  rationals  less  than  q.  Under  the  identification  of  Q  with  a  subset  of 
R,  the  cut  defining  a  real  number  consists  of  all  rational  numbers  less  than  the 
given  real  number. 

The  set  of  cuts  gets  a  natural  ordering,  given  by  inclusion.  In  place  of  C,  we 
write  <.  For  any  two  cuts  r  and  s,  we  have  r  <  s  or  s  <  r,  and  if  both  occur, 
then  r  =  s.  We  can  then  define  <,  >,  and  >  in  the  expected  way.  The  positive 
cuts  r  are  those  with  0*  <  r,  and  the  negative  cuts  are  those  with  r  <0*. 

Once  cuts  and  their  ordering  are  in  place,  one  can  go  about  defining  the  usual 
operations  of  arithmetic  and  proving  that  R  with  these  operations  satisfies  the 
familiar  associative,  commutative,  and  distributive  laws,  and  that  these  interact 
with  inequalities  in  the  usual  ways.  The  definitions  of  addition  and  subtraction  are 
easy:  the  sum  or  difference  of  two  cuts  is  simply  the  set  of  sums  or  differences  of 
the  rationals  from  the  respective  cuts.  For  multiplication  and  reciprocals  one  has 
to  take  signs  into  account.  For  example  the  product  of  two  positive  cuts  consists 
of  all  products  of  positive  rationals  from  the  two  cuts,  as  well  as  0  and  all  negative 
rationals.  After  these  definitions  and  the  proofs  of  the  usual  arithmetic  operations 
are  complete,  it  is  customary  to  write  0  and  1  in  place  of  0*  and  1*. 

This  much  allows  us  to  define  /7th  roots.  The  following  proposition  gives  the 
precise  details. 

Proposition.  If  r  is  a  positive  real  number  and  n  is  a  positive  integer,  then 
there  exists  a  unique  positive  real  number  s  such  that  s"  =  r. 

Remark.  In  the  terminology  and  notation  introduced  in  Section  1.3,  the 
polynomial  Xn  —  r  in  R[X]  has  a  unique  positive  root  if  r  is  positive  in  R. 

Sketch  of  proof.  Let  5  consist  of  all  positive  rationals  q  such  that  qn  <  r, 
together  with  all  rationals  <  0.  One  checks  that  s  is  a  cut  and  that  s'1  =  r.  This 
proves  existence.  For  uniqueness  any  positive  cut  s'  with  ( s')n  =  r  must  contain 
exactly  the  same  rationals  and  hence  must  equal  s.  □ 
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To  make  efficient  use  of  cuts  in  connection  with  arithmetic  and  algebra,  one 
needs  to  develop  a  certain  amount  of  real-variable  theory.  This  theory  will  not 
be  developed  in  any  detail  here;  let  us  be  content  with  a  sketch,  giving  a  proof  of 
the  one  specific  result  that  we  shall  need.5 

The  first  step  in  the  process  is  to  observe  that  any  nonempty  subset  of  reals 
with  an  upper  bound  has  a  least  upper  bound  (the  sup  remum,  written  as  sup). 
This  is  proved  by  taking  the  union  of  the  cuts  for  each  of  the  given  real  numbers 
and  showing  that  the  result  is  a  cut.  Similarly  any  nonempty  subset  of  reals  with 
a  lower  bound  has  a  greatest  lower  bound  (the  infimum,  written  as  inf).  This 
property  follows  by  applying  the  least-upper-bound  property  to  the  negatives  of 
the  given  reals  and  then  taking  the  negative  of  the  resulting  least  upper  bound. 

Meanwhile,  we  can  introduce  sequences  of  real  numbers  and  convergence 
of  sequences  in  the  usual  way.  In  terms  of  convergence,  the  key  property  of 
sequences  of  real  numbers  is  given  by  the  Bolzano-Weierstrass  Theorem:  any 
bounded  sequence  has  a  convergent  subsequence.  In  fact,  if  the  given  bounded 
sequence  is  {5,,},  it  can  be  shown  that  there  is  a  subsequence  convergent  to  the 
greatest  lower  bound  over  m  of  the  least  upper  bound  for  k  >  in  of  the  numbers 
sk. 

Next  one  introduces  continuity  of  functions  in  the  usual  way.  The  Bolzano- 
Weierstrass  Theorem  may  readily  be  used  to  prove  that  any  continuous  real-valued 
function  on  a  closed  bounded  interval  takes  on  its  maximum  and  minimum  values. 
With  a  little  more  effort  the  Bolzano-Weierstrass  Theorem  may  be  used  also 
to  show  that  any  continuous  real-valued  function  on  a  closed  bounded  interval 
is  uniformly  continuous.  That  brings  us  to  the  theorem  that  we  shall  use  in 
developing  basic  algebra. 

Theorem  (Intermediate  Value  Theorem).  Let  a  <  b  be  real  numbers,  and  let 
/  :  [a,  b]  — >  M  be  continuous.  Then  /,  in  the  interval  [a,  b\.  takes  on  all  values 
between  f(a)  and  /(£>). 

PROOF.  Let  f  (a)  =  a  and  f(b)  =  /?,  and  let  y  be  between  a  and  /i.  We  may 
assume  that  y  is  in  fact  strictly  between  a  and  /3.  Possibly  by  replacing  /  by 
— /,  we  may  assume  that  also  a  <  ft.  Let 

A  =  {j  e  [u,  b ]  |  f(x )  <  y}  and  B  =  [x  e  [a,  b ]  |  f(x)  >  y}. 

These  sets  are  nonempty  since  a  is  in  A  and  b  is  in  B,  and  /  is  bounded  since 
any  continuous  function  on  a  closed  bounded  interval  takes  on  finite  maximum 
and  minimum  values.  Thus  the  numbers  y\  =  sup  {fix)  \  x  e  A]  and  y?  = 
inf  {f(x)  \  x  e  B]  are  well  defined  and  have  yi  <  y  <  y^- 

5  Details  of  the  omitted  steps  may  be  found,  for  example,  in  Section  1. 1  of  the  author’s  book  Basic 
Real  Analysis. 
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If  yi  =  y,  then  we  can  find  a  sequence  {.r„}  in  A  such  that  f (xn)  converges  to  y . 
Using  the  Bolzano-Weierstrass  Theorem,  we  can  find  a  convergent  subsequence 
{xlu }  of  {xn},  say  with  limit  x0.  By  continuity  of  /,  {f(xnit)}  converges  to  /(.to). 
Then  /(to)  =  y\  =  y,  and  we  are  done.  Arguing  by  contradiction,  we  may 
therefore  assume  that  y\  <  y.  Similarly  we  may  assume  that  y  <y2,  but  we  do 
not  need  to  do  so. 

Let  €  =  Y2  —  Yu  and  choose,  since  the  continuous  function  /  is  necessarily 
uniformly  continuous,  5  >  0  such  that  |ti  — 12|  <5  implies  | / (t i )  —  f(x2)\  < 
e  whenever  x\  and  t2  both  lie  in  [«,  b].  Then  choose  an  integer  n  such  that 
2~n  (b  —  a)  <  8,  and  consider  the  value  of  /  at  the  points  pk  =  a  +  k2~n  (b  —  a) 
forO  <  k  <  2".  Since pk+\  — Pk  =  2 ~n(b—a)  <  8,wehave\f(pk+\)  —  f(pk)\  < 
€  =  y2  —  Y\-  Consequently  if  /( pk)  <  y\ ,  then 

f(Pk+ 1)  <  f(Pk)  +  \f(Pk+i)  -  f(Pk)\  <Y\  +  (y2  ~  Yi)  =  Y2, 

and  hence  f(pk+ 0  <  y i.  Now  f(po )  =  f(a )  =  a  <  y\.  Thus  induction  shows 
that  fipic)  <  Yi  for  all  k  <  2".  However,  for  k  =  2”,  we  have  p2»  =  b.  Hence 
f{b)=P>y>yi,  and  we  have  arrived  at  a  contradiction.  □ 


A4.  Complex  Numbers 

Complex  numbers  are  taken  as  known,  and  this  section  reviews  their  notation  and 
basic  properties. 

Briefly,  the  system  C  of  complex  numbers  is  a  two-dimensional  vector  space 
over  R  with  a  distinguished  basis  { 1 ,  i }  and  a  multiplication  defined  initially  by 
11  =  1,  li  =  i  1  =  i,  and  ii  =  —1.  Elements  may  then  be  written  as  a  +  bi  or 
a  +  ib  with  a  and  b  in  R;  here  a  is  an  abbreviation  for  a  1.  The  multiplication  is 
extended  to  all  of  C  so  that  the  distributive  laws  hold,  i.e.,  so  that  (a  +  bi)(c  +  di) 
can  be  expanded  in  the  expected  way.  The  multiplication  is  associative  and 
commutative,  the  element  1  acts  as  a  multiplicative  identity,  and  every  nonzero 
element  has  a  multiplicative  inverse:  ( a  +  bi ){ai^hi  ~  i yrpyi)  =  L 

Complex  conjugation  is  indicated  by  a  bar:  the  conjugate  of  a  +  bi  is  a  —  bi 
if  a  and  b  are  real,  and  we  write  a  +  bi  =  a  —  bi.  Then  we  have  z  +  w  =  z  +  w, 
Tz.  =  rz  if  r  is  real,  and  zw  =  zw. 

The  real  and  imaginary  parts  of  z  =  a  +  bi  are  Re  z  =  a  and  Imz  =  b. 
These  may  be  computed  as  Re z  =  \ (z  +  z)  and  Imz  = 

The  absolute  value  function  of  z  =  a  +  bi  is  given  by  |z|  =  \Ja2  +  b2,  and 
this  satisfies  \z\2  =  zz.  It  has  the  simple  properties  that  |z|  =  |z|,  |  Rez|  <  |z|, 
and  |  Im  z\  <  kl-  In  addition,  it  satisfies 

,  _  \zw\  =  \z\\u>\ 

\zw\~  =  ZWZW  =  ZWZW  =  zzww  =  \z\  \w\~. 


because 
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and  it  satisfies  the  triangle  inequality 

\z  +  w\  <  \z\  +  M 

because  |z  +  w\2  =  (z  +  w)(z  +  w)  =  zz  +  zw  +  wz  +  wu> 

=  \z\2  +  2Re(zu>)  +  \w\2  <  \z\2  +  2\zw\  +  \w\ 2 
=  \z\2  +  2\z\\w\  +  \w\2  =  (|z|  +  \w\)2. 


A5.  Partial  Orderings  and  Zorn’s  Lemma 

A  partial  ordering  on  a  set  S  is  a  relation  between  S  and  itself,  i.e.,  a  subset  of 
S  x  S,  satisfying  two  properties.  We  define  the  expression  a  <  b  to  mean  that  the 
ordered  pair  ( a ,  b )  is  a  member  of  the  relation,  and  we  say  that  “<”  is  the  partial 
ordering.  The  properties  are 

(i)  a  <  a  for  all  a  in  5,  i.e.,  <  is  reflexive, 

(ii)  a  <  b  and  b  <  c  together  imply  a  <  c  whenever  a,  h .  and  c  are  in  S,  i.e., 

<  is  transitive. 

An  example  of  such  an  S  is  any  set  of  subsets  of  a  set  X,  with  <  taken  to 
be  inclusion  C.  This  particular  partial  ordering  has  a  third  property  of  interest, 
namely 

(iii)  a  <  b  and  b  <  a  with  a  and  b  in  S  imply  a  =  b. 

However,  the  validity  of  (iii)  has  no  bearing  on  Zorn’s  Lemma  below.  A  partial 
ordering  is  said  to  be  a  total  ordering  or  simple  ordering  if  (iii)  holds  and  also 

(iv)  any  a  and  b  in  S  have  a  <  h  or  b  <  a  or  both. 

For  the  sake  of  a  result  to  be  proved  at  the  end  of  the  section,  let  us  interpolate 
one  further  definition:  a  totally  ordered  set  is  said  to  be  well  ordered  if  every 
nonempty  subset  has  a  least  element,  i.e.,  if  each  nonempty  subset  contains  an 
element  a  such  that  a  <  b  for  all  b  in  the  subset. 

A  chain  in  a  partially  ordered  set  S  is  a  totally  ordered  subset.  An  upper 
bound  for  a  chain  T  is  an  element  it  in  S  such  that  c  <  u  for  all  c  in  7'.  A 
maximal  element  in  S  is  an  element  m  such  that  whenever  m  <  a  for  some  a  in 
5,  then  a  <  m .  (If  (iii)  holds,  we  can  conclude  in  this  case  that  m  =  a.) 

Zorn’s  Lemma.  If  S  is  a  nonempty  partially  ordered  set  in  which  every  chain 
has  an  upper  bound,  then  S  has  a  maximal  element. 

Remarks.  Zorn’s  Lemma  will  be  proved  below  using  the  Axiom  of  Choice, 
which  was  stated  in  Section  Al.  It  is  an  easy  exercise  to  see,  conversely, 
that  Zorn’s  Lemma  implies  the  Axiom  of  Choice.  It  is  customary  with  many 
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mathematical  writers  to  mention  Zorn’s  Lemma  each  time  it  is  invoked,  even 
though  most  writers  nowadays  do  not  ordinarily  acknowledge  uses  of  the  Axiom 
of  Choice.  Before  coming  to  the  proof,  we  give  an  example  of  how  Zorn’s  Lemma 
is  used.  This  example  uses  vector  spaces  and  is  expanded  upon  in  Section  II. 9. 

Example.  Zorn’s  Lemma  gives  a  quick  proof  that  any  real  vector  space  V 
has  a  basis.  In  fact,  let  S  be  the  set  of  all  linearly  independent  subsets  of  V ,  and 
order  S  by  inclusion  upward  as  in  the  example  above  of  a  partial  ordering.  The 
set  S  is  nonempty  because  0  is  a  linearly  independent  subset  of  V .  Let  T  be  a 
chain  in  S ,  and  let  u  be  the  union  of  the  members  of  T .  If  t  is  in  T,  we  certainly 
have  t  C  ii.  Let  us  see  that  u  is  linearly  independent.  For  u  to  be  dependent 
would  mean  that  there  are  vectors  x\ , . . . ,  xn  in  u  with  r\X\  +  •  •  •  +  r„xn  =  0  for 
some  system  of  real  numbers  not  all  0.  Let  xj  be  in  the  member  tj  of  the  chain 
T .  Since  t\  C  t2  or  t2  Q  t\,  X\  and  x2  are  both  in  t\  or  both  in  t2.  To  keep  the 
notation  neutral,  say  they  are  both  in  t'2.  Since  t'2  C  f3  or  t2  C  r2.  all  of  x \ ,  x2,  x2 
are  in  t'2  or  they  are  all  in  t2.  Say  they  are  both  in  t'3.  Continuing  in  this  way, 
we  arrive  at  one  of  the  sets  t\ ,  say  t'n,  such  that  all  of  X\. ,  xn  are  all 

in  t'n .  The  members  of  t'n  are  linearly  independent  by  assumption,  and  we  obtain 
the  contradiction  r\  =  •  •  •  =  rn  =  0.  We  conclude  that  the  chain  T  has  an  upper 
bound  in  S.  By  Zorn’s  Lemma,  S  has  a  maximal  element,  say  m.  If  m  is  not 
a  basis,  it  fails  to  span.  If  a  vector  x  is  not  in  its  span,  it  is  routine  to  see  that 
m  U  {x}  is  linearly  independent  and  properly  contains  in,  in  contradiction  to  the 
maximality  of  m .  We  conclude  that  in  is  a  basis. 

We  now  begin  the  proof  of  Zorn’s  Lemma.  If  T  is  a  chain  in  a  partially  ordered 
set  S,  then  an  upper  bound  uq  for  T  is  a  least  upper  bound  for  T  if  uq  <  u  for  all 
upper  bounds  of  T .  If  (iii)  holds  in  S,  then  there  can  be  at  most  one  least  upper 
bound  for  T .  In  fact,  if  uq  and  u'0  are  least  upper  bounds,  then  no  <  Uq  since 
uq  is  a  least  upper  bound,  and  u'Q  <  uq  since  u{)  is  a  least  upper  bound;  by  (iii), 
uq  =  u[y  The  proof  follows  that  in  Dunford-Schwartz’s  Linear  Operators  I. 

Lemma.  Let  A  be  a  nonempty  partially  ordered  set  such  that  (iii)  holds,  and 
write  <  for  the  partial  ordering.  Suppose  that  X  has  the  additional  property  that 
each  nonempty  chain  in  X  has  a  least  upper  bound  in  A.  If  /  :  A  — >■  A  is  a 
function  such  that  x  <  fix)  for  all  x  in  A,  then  there  exists  an  xo  in  A  with 
fix  o)  =  Xq. 

Proof.  A  nonempty  subset  £  of  A  will  be  called  admissible  for  purposes  of 
this  proof  if  f(E)  C  E  and  if  the  least  upper  bound  of  each  nonempty  chain  in 
E,  which  exists  in  A  by  assumption,  actually  lies  in  E.  By  assumption,  A  is  an 
admissible  subset  of  A.  If  x  is  in  A,  then  the  intersection  of  admissible  subsets  of 
A  containing  x  is  admissible.  Let  A_v  be  the  intersection  of  all  admissible  subsets 
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of  X  containing  x.  This  is  admissible,  and  since  the  set  of  all  y  in  X  with  x  <  y 
is  admissible  and  contains  x,  it  follows  that  x  <  y  for  all  y  e  Ax.  By  hypothesis, 
X  is  nonempty.  Fix  an  element  a  in  X,  and  let  A  =  Aa.  The  main  step  will  be  to 
prove  that  A  is  a  chain. 

To  do  so,  consider  the  subset  C  of  members  x  of  A  with  the  property  that  there 
is  a  nonempty  chain  Cx  in  A  containing  a  and  x  such  that 

•  a.  <  y  <  x  for  all  y  in  Cx, 

•  f(Cx  -  {a-})  C  Cx,  and 

•  the  least  upper  bound  of  any  nonempty  subchain  of  Cx  is  in  Cx. 

The  element  a  is  in  C  because  we  can  take  Ca  =  {«}■  If  x  is  in  C,  so  that  Cx 
exists,  let  us  use  the  bulleted  properties  to  see  that 

A  =  Ax  U  Cx .  (*) 

We  have  A  3  Cx  by  definition;  also  A  fl  Ax  is  an  admissible  set  containing  x  and 
hence  containing  A,  and  thus  A  Ax.  Therefore  A  2  A.t  U  Cx.  For  the  reverse 
inclusion  it  is  enough  to  prove  that  Ax  U  Cx  is  an  admissible  subset  of  X  containing 
a .  The  element  a  is  in  Cx ,  and  thus  a  is  in  Ax  U  Cx .  For  the  admissibility  we  have  to 
show  that  f(  Ax  U  Cx )  C  Ax  U  Cx  and  that  the  least  upper  bound  of  any  nonempty 
chain  in  Ax  U  Cx  lies  in  Ax  U  Cx.  Since  x  lies  in  Ax,  Ax  U  Cx  =  Ax  U  (Cx  —  {x}) 
and  f(Ax  U  Cx)  =  f(Ax )  U  f(Cx  —  {x})  C  Ax  U  Cx,  the  inclusion  following 
from  the  admissibility  of  A  and  the  second  bulleted  property  of  Cx. 

To  complete  the  proof  of  (*),  take  a  nonempty  chain  in  Ax  U  Cx ,  and  let  u  be 
its  least  upper  bound  in  X;  it  is  enough  to  show  that  u  is  in  Ax  U  Cx.  The  element 
u  is  necessarily  in  A  since  A  is  admissible.  Observe  that 

v  <  x  and  x<z  whenever  y  is  in  Cx  and  z  is  in  Ax .  (**) 

If  the  chain  has  at  least  one  member  in  Ax,  then  (**)  implies  that  x  <  u,  and 
hence  the  set  of  members  of  the  chain  that  lie  in  Ax  forms  a  nonempty  chain  in 
A.v  with  least  upper  bound  u.  Since  Ax  is  admissible,  u  is  in  A_v.  Otherwise  the 
chain  has  all  its  members  in  Cx,  and  then  u  is  in  Cx  by  the  third  bulleted  property 
of  Cx. 

This  completes  the  proof  of  (*)•  Let  us  now  prove  that  if  Cx  and  Cx>  exist  with 
x  <  x'  and  x  7^  x',  then 

Cx  c  Cv,  (t) 

In  fact,  application  of  (*)  to  x'  gives  A  =  Ax>  U  Cx>.  Intersecting  both  sides  with 
Cx  shows  that  Cx  =  (Cx  0  Ax>)  U  (Cx  0  Cx>).  On  the  right  side,  the  first  member 
is  empty  by  (**),  and  thus  Cx  =  Cx  0  Cx>.  This  proves  (t). 

Let  C  be  the  set  of  all  members  x  of  A  for  which  Cx  exists.  We  have  seen  that 
a  is  in  C.  If  we  apply  (*)  and  (**)  first  to  a  member  x  of  C  and  then  to  a  member 
x'  of  C,  we  see  that  either  x  <  x'  or  x'  <  x.  That  is,  C  is  a  chain. 
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Let  us  see  that  /(C)  C  C.  If  x  is  in  C,  then  the  set  D  =  Cx  U  {/(x)}  certainly 
has  a  as  a  member.  The  second  bulleted  property  of  Cx  shows  that  /  carries 
Cx  —  {x}  into  D,  and  also  /  carries  x  into  D.  Thus  /  carries  D  —  {/(x)}  into 
D,  and  D  satisfies  the  second  bulleted  property  of  Cf(x).  If  {xa}  is  a  chain  in  D 
with  least  upper  bound  u,  there  are  two  possibilities.  Either  u  is  /(x),  which  is 
in  D  by  construction,  or  u  is  in  C,  which  contains  the  least  upper  bound  of  any 
nonempty  chain  in  it.  Thus  u  is  in  D,  D  satisfies  the  third  bulleted  property  of 
Cf(X),  and  Cf(x)  exists.  In  other  words,  /(x)  is  in  C,  and  /(C)  C  C. 

Finally  let  us  see  that  the  least  upper  bound  u  of  an  arbitrary  chain  jx/1  in  C, 
which  exists  in  X  by  assumption,  is  a  member  of  C.  If  xa  =  u  for  some  a,  then 
Cu  =  CXa  exists,  and  u  is  in  C.  So  assume  that  xa  /  a  for  all  a.  Our  candidate 
for  Cu  will  be  D  =  (ljff  CXa )  U  {«}.  This  certainly  contains  a.  We  check  that 
D  satisfies  the  second  bulleted  property  of  C„.  For  each  a,  we  can  find  a  /  with 
xa  <  xp  and  xa  /  xp,  since  u  is  the  least  upper  bound  of  all  the  x’s.  Then  (t) 
gives  CXa  C  CXfj  -  {xp},  and  f(CxJ  C  f(CXfs  -  {xp})  C  CXp  C  D.  Taking  the 
union  over  a  shows  that  D  satisfies  the  second  bulleted  property  of  Cu . 

To  see  that  D  satisfies  the  third  bulleted  property  of  Cu ,  let  v  be  the  least  upper 
bound  in  A  of  a  chain  {yp}  in  Cu.  If  then  v  cannot  be  an  upper  bound  of 

{x/}.  So  we  can  choose  some  xao  such  that  v  <  x„0.  Each  yp  is  <  v,  and  thus 
each  yp  is  <  xao.  Referring  to  (*),  we  see  that  all  yp ’s  lie  in  CXa  .  By  the  third 
bulleted  property  of  Cv  ,  u  is  in  Cx  .  Thus  v  is  in  /J,  and  D  satisfies  the  third 
bulleted  property  of  Cu.  Consequently  the  least  upper  bound  it  of  an  arbitrary 
chain  in  C  lies  in  C. 

In  short,  C  is  an  admissible  set  containing  a,  and  it  also  is  a  chain.  Since  A  is 
a  minimal  admissible  set  containing  a,  C  =  A  and  also  A  is  a  chain.  Let  u  be  the 
least  upper  bound  of  A.  We  have  seen  that  /(A)  C  A,  and  thus  /(«)  <  u.  On 
the  other  hand,  u  <  f(u )  by  the  defining  property  of  /.  Therefore  f(u)  =  u, 
and  the  proof  is  complete. 

Proof  of  Zorn’s  Lemma.  Let  S  be  a  partially  ordered  set,  with  partial 
ordering  <,  in  which  every  chain  has  an  upper  bound.  Let  X  be  the  partially 
ordered  system,  ordered  by  inclusion  upward  C,  of  nonempty  chains6  in  S.  The 
partially  ordered  system  X,  being  given  by  ordinary  inclusion,  satisfies  property 
(iii).  A  nonempty  chain  C  in  X  is  a  nested  system  of  chains  ca  of  5,  and  (Ja  ca  is 
a  chain  in  S  that  is  a  least  upper  bound  for  C.  The  lemma  is  therefore  applicable 
to  any  function  /  :  X  — >  X  such  that  c  C  /(c)  for  all  c  in  X.  We  use  the  lemma 
to  produce  a  maximal  chain  in  X. 

Arguing  by  contradiction,  suppose  that  no  chain  within  S  is  maximal  under 

6FLere  a  chain  is  simply  a  certain  kind  of  subset  of  5,  and  no  element  of  S  can  occur  more  than 
once  in  it  even  if  (iii)  fails  for  the  partial  ordering.  Thus  if  S  =  { x ,  y}  with  x  <  y  and  y  <  x,  then 
{x,  y}  is  in  X  and  in  fact  is  maximal  in  X. 
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inclusion.  For  each  nonempty  chain  c  within  S,  let  f(c)  be  a  chain  with  c  C  /(c) 
and  c  /(c).  (This  choice  of  /(c)  for  each  c  is  where  we  use  the  Axiom  of 
Choice.)  The  result  is  a  function  /  :  X  — >  X  of  the  required  kind,  the  lemma 
says  that  /(c)  =  c  for  some  c  in  X,  and  we  arrive  at  a  contradiction.  We  conclude 
that  there  is  some  maximal  chain  co  within  S. 

By  assumption  in  Zorn’s  Lemma,  every  nonempty  chain  within  S  has  an  upper 
bound.  Let  »o  be  an  upper  bound  for  the  maximal  chain  co.  If  u  is  a  member  of  S 
with  «o  <  m,  then  co  U  {u}  is  a  chain  and  maximality  implies  that  co  U  {u}  =  co . 
Therefore  u  is  in  co,  and  u  <  no ■  This  is  the  condition  that  mo  is  a  maximal 
element  of  S.  □ 

Corollary  (Zermelo's  Well-Ordering  Theorem).  Every  set  has  a  well  ordering. 

PROOF.  Let  5  be  a  set,  and  let  £  be  the  family  of  all  pairs  (£,  <e)  such  that  E 
is  a  subset  of  S  and  <£  is  a  well  ordering  of  E.  The  family  E  is  nonempty  since 
( 0 ,  0 )  is  a  member  of  it.  We  partially  order  £  by  a  notion  of  “inclusion  as  an 
initial  segment,”  saying  that  ( E ,  <#)  <  ( F.  <p)  if 

(i)  E  c  F, 

(ii)  a  and  b  in  E  with  a  Se  b  implies  a  <p  b, 

(iii)  a  in  E  and  b  in  F  but  not  E  together  imply  a  <f  b. 

In  preparation  for  applying  Zorn’s  Lemma,  let  C  =  {(£„,  <„)}  be  a  chain  in  £, 
with  the  a ’s  running  through  some  set  / .  Define  Eq  =  [_Ja  Ea  and  define  <o  as 
follows:  If  e\  and  e-±  are  in  Eq,  let  e\  be  in  Eai  with  a\  in  /,  and  let  e2  be  in  Eai 
with  oi2  in  I  ■  Since  C  is  a  chain,  we  may  assume  without  loss  of  generality  that 
(Eai,  <ai)  <  (Eai,  <„,),  so  that  EU]  C  Eai  in  particular.  Then  e\  and  e2  are  both 
in  Ea2  and  we  define  e\  <o  e2  if  C]  <„2  C2,  and  C2  <o  e\  if  e2  <ai  ci .  Because  of 
(i)  and  (ii)  above,  the  result  is  well  defined  independently  of  the  choice  of  a\  and 
0(2-  Similar  reasoning  shows  that  <o  is  a  total  ordering  of  Eq.  If  we  can  prove 
that  <o  is  a  well  ordering,  then  (Eq,  <o)  is  evidently  an  upper  bound  in  £  for  the 
chain  C.  and  Zorn’s  Lemma  is  applicable. 

Now  suppose  that  F  is  a  nonempty  subset  of  Eq.  Pick  an  element  of  F,  and 
let  Eao  be  a  set  in  the  chain  that  contains  it.  Since  (Eao,  <„0)  is  well  ordered  and 
F  fl  Eat]  is  nonempty,  F  D  Eao  contains  a  least  element  /o  relative  to  <„0.  We  show 
that  /o  <o  /  for  all  /  in  F.  In  fact,  if  /  is  given,  there  are  two  possibilities.  One 
is  that  /  is  in  Eao\  in  this  case,  the  consistency  of  <o  with  <„n  forces  /o  <o  /■ 
The  other  is  that  /  is  not  in  Eao  but  is  in  some  Eai .  Since  C  is  a  chain  and 
Eat  Q  Eao  fails,  we  must  have  (Eao,  <„0)  <  (£„, ,  <„,).  Then  /  is  in  Eai  but 
not  Eao,  and  property  (iii)  above  says  that  /o  <„,  /.  By  the  consistency  of  the 
orderings,  /o  <o  /■  Hence  /o  is  a  least  element  in  F,  and  Eq  is  well  ordered. 

Application  of  Zorn’s  Lemma  produces  a  maximal  element  (E,  <e)  of  £.  If 
E  were  a  proper  subset  of  S,  we  could  adjoin  to  £  a  member  .v  of  5  not  in  E  and 
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define  every  element  e  of  E  to  be  <  s.  The  result  would  contradict  maximality. 
Therefore  E  =  S,  and  S  has  been  well  ordered.  □ 


A6.  Cardinality 

Two  sets  A  and  B  are  said  to  have  the  same  cardinality,  written  card  A  =  card  B. 
if  there  exists  a  one-one  function  from  A  onto  B.  On  any  set  A  of  sets,  “having  the 
same  cardinality”  is  plainly  an  equivalence  relation  and  therefore  partitions  A  into 
disjoint  equivalence  classes,  the  sets  in  each  class  having  the  same  cardinality.  The 
question  of  what  constitutes  cardinality  (or  a  “cardinal  number”)  in  its  own  right 
is  one  that  is  addressed  in  set  theory  but  that  we  do  not  need  to  address  carefully 
here;  the  idea  is  that  each  equivalence  class  under  “having  the  same  cardinality” 
has  a  distinguished  representative,  and  the  cardinal  number  is  defined  to  be  that 
representative.  We  write  card  A  for  the  cardinal  number  of  a  set  A. 

Having  addressed  equality,  we  now  introduce  a  partial  ordering,  saying  that 
card  A  <  card  B  if  there  is  a  one-one  function  from  A  into  B .  The  first  result  below 
is  that  card  A  <  card  B  and  card  B  <  card  A  together  imply  card  A  =  card  B. 

Proposition  (Schroeder-Bernstein  Theorem).  If  A  and  B  are  sets  such  that 
there  exist  one-one  functions  /  :  A  — >■  B  and  g  :  B  — >  A,  then  A  and  B  have 
the  same  cardinality. 

Proof.  Define  the  function  g-1  :  image  g  — »■  A  by  g~ 1  (g(a))  =  a:  this 
definition  makes  sense  since  g  is  one-one.  Write  (g  o  f)(n>  for  the  composition 
of  g  o  /  with  itself  n  times,  and  define  (/  o  g)(n>  similarly.  Define  subsets  A„ 
and  An  of  A  and  subsets  / > ; .  and  B^  for  / /  ^  0  by 

An  =  image  ((g  o  f)(n))  -  image((g  o  f  )(n)  o  g), 

A'n  =  image ((g  o  f)(n)  o  g)  -  image((g  o  /)(”+1)), 

B„  =  image((/  o  g)in})  -  imag e((/  o  g)(n)  o  /), 

B'n  =  image((/  o  g)(,,)  o  /)  -  imag e((/  o  g)("+1)), 


and  let 


Aoo  =  H  image ((g  o  f  )(n>) 

n= 0 


and  5oo  =  f|  image((/  o  g)(n)). 

n= 0 


Then  we  have 


OO  OO 

A  =  Aoo  u  U  K  U  u  K 

n= 0  n= 0 


OO  OO 

fi  =  fimUU4UU  B'n, 

n= 0  n= 0 


and 
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with  both  unions  disjoint. 

Let  us  prove  that  /  carries  A„  one-one  onto  B'n.  If  a  is  in  A„,  then  a  = 
( g  o  /)(,!)(x)  for  some  x  e  A  and  a  is  not  of  the  form  (g  o  f)(n> (g(y))  with 
y  e  B.  Applying  /,  we  obtain  /(a)  =  (/  o  (( g  o  /)(n})(x)  =  (/  o  g){n)(f(x)), 
so  that  f(a )  is  in  the  image  of  ((/  o  g)(,,)  o  /).  Meanwhile,  if  f(a)  is  in  the 
image  of  (/  o  g)("+1),  then  f{a )  =  (/  o  g)<n+1)(y)  =  /((g  o  /)w(g(y)))  for 
some  y  e  B.  Since  /  is  one-one,  we  can  cancel  the  /  on  the  outside  and  obtain 
a  =  (gof)M(g(y)),  in  contradiction  to  the  fact  that  a  is  in  An .  Thus  /  carries 
An  into  B'n,  and  it  is  certainly  one-one.  To  see  that  f(An)  contains  all  of  B\ ' ,  let 
b  €  B'n  be  given.  Then  b  =  (/  o  g)ln>  if  (x))  for  some  x  €  A  and  b  is  not  of  the 
form  (/  o  g)(',+1,(y)  with  y  e  B.  Hence  b  =  f((g  o  f)(n>(x)),  i.e.,  b  =  f(a) 
with  a  =  (g  o  ffn\x).  If  this  element  a  were  in  the  image  of  (g  o  o  g, 
we  could  write  a  =  (g  o  /)(n)(g(y))  for  some  y  e  B ,  and  then  we  would  have 
b  =  f(a)  =  f((g  o  /)(")(g(y)))  =  (/  o  g)("+1)(y),  contradiction.  Thus  a  is  in 
An ,  and  /  carries  An  one-one  onto  B'n . 

Similarly  g  carries  Bn  one-one  onto  A'n.  Since  A'n  is  in  the  image  of  g,  we  can 
apply  g_1  to  it  and  see  that  g-1  carries  A'n  one-one  onto  B„. 

The  same  kind  of  reasoning  as  above  shows  that  /  carries  A0 0  one-one  onto 
5oo.  In  summary,  /  carries  each  A„  one-one  onto  B'n  and  carries  one-one 
onto  5oo,  while  g-1  carries  each  A'n  one-one  onto  Bn.  Then  the  function 

1  /  on  A0 o  and  each  A„ , 
h  =  \  , 

[  g  1  on  each  A'n , 

carries  A  one-one  onto  B.  □ 

Next  we  show  that  any  two  sets  A  and  B  have  comparable  cardinalities  in  the 
sense  that  either  card  A  <  card  B  or  card  B  <  card  A. 

Proposition.  If  A  and  B  are  two  sets,  then  either  there  is  a  one-one  function 
from  A  into  B  or  there  is  a  one-one  function  from  B  into  A. 

PROOF.  Consider  the  set  S  of  all  one-one  functions  /  :  E  — >■  B  with  fcA, 
the  empty  function  with  E  =  0  being  one  such.  Each  such  function  is  a  certain 
subset  of  Ax  B.  If  we  order  S  by  inclusion  upward,  then  the  union  of  the  members 
of  any  chain  is  an  upper  bound  for  the  chain.  By  Zorn’s  Lemma  let  G  :  Eq  -a-  B 
be  a  maximal  one-one  function  of  this  kind,  and  let  Fq  be  the  image  of  G.  If 
Eq  =  A,  then  G  is  a  one-one  function  from  A  into  B.  If  Fq  =  B,  then  G-1 
is  a  one-one  function  from  B  into  A.  If  neither  of  these  things  happens,  then 
there  exist  xq  e  A  —  Eq  and  yo  in  B  —  G,,  and  the  function  G  equal  to  G  on 
Eq  and  having  G(x o)  =  yo  extends  G  and  is  still  one-one;  thus  it  contradicts  the 
maximality  of  G.  □ 
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Corollary.  If  E  is  an  infinite  set,  then  E  has  a  countably  infinite  subset. 

PROOF.  The  proposition  shows  that  either  there  is  a  one-one  function  from  the 
set  of  positive  integers  into  E ,  in  which  case  we  are  done,  or  there  is  a  one-one 
function  from  E  into  the  set  of  positive  integers.  In  the  latter  case  the  image  cannot 
be  finite  since  E  is  assumed  infinite.  Then  the  image  must  be  an  infinite  subset 
of  the  positive  integers.  This  set  can  be  enumerated  and  is  therefore  countably 
infinite.  Thus  E  is  countably  infinite.  □ 

Cantor’s  proof  that  there  exist  uncountable  sets,  done  with  a  diagonal  argument, 
in  fact  showed  how  to  start  from  any  set  A  and  construct  a  set  with  strictly  larger 
cardinality. 

Proposition  (Cantor).  If  A  is  a  set  and  2A  denotes  the  set  of  all  subsets  of  A, 
then  card  2A  is  strictly  larger  than  card  A. 

PROOF.  The  map  x  {x}  is  a  one-one  function  from  A  into  2A.  If  we  are 
given  a  one-one  function  F  :  A  — »  2A,  let  E  be  the  set  of  all  x  in  A  such  that  x 
is  not  in  F(x).  If  we  define  E  =  F(x o),  then  xo  e  E  implies  xo  g  F(x o)  =  E, 
while  Xq  £  E  implies  x  e  F(x o)  =  E.  We  have  a  contradiction  in  any  case,  and 
hence  E  cannot  be  of  the  form  F(x o).  We  conclude  that  F  cannot  be  onto  2A.  □ 

Proposition.  If  E  is  an  infinite  set,  then  E  is  the  disjoint  union  of  sets  that  are 
each  countably  infinite. 

PROOF.  Let  S  be  the  set  of  all  disjoint  unions  of  countably  infinite  subsets  of 
E.  If  A  =  [_Ja  Aa  and  B  =  (J  .;  Bp  are  members  of  S ,  say  that  A  <  B  if  each 
Aa  is  some  Bp.  The  result  is  a  partial  ordering  on  S.  If  if  is  a  chain  in  S,  then 
the  collection  C  of  all  countably  infinite  sets  that  are  Ua ’s  in  some  member  of  U 
is  a  collection  of  countably  infinite  subsets  of  E  that  contains  each  member  of  U. 
If  Ua  and  Up  are  distinct  members  of  C,  then  Ua  and  Up  must  both  be  in  some 
member  of  U  and  hence  must  be  disjoint.  Thus  C  is  an  upper  bound  for  U.  Also, 
the  empty  union  is  a  member  of  S.  By  Zorn’s  Lemma,  S  has  a  maximal  element 
M.  Let  F  be  the  union  of  the  members  of  M.  If  L  —  F  were  to  be  infinite,  then 
the  corollary  above  would  show  that  E  —  F  has  a  countably  infinite  subset  Z, 
and  M  U  {Z}  would  contradict  the  maximality  of  M.  Thus  E  —  F  is  finite.  Since 
E  is  infinite,  the  corollary  shows  that  E  contains  at  least  one  countably  infinite 
subset.  Thus  M  has  some  member  T .  The  set  T  =  T  U  (E  —  F)  is  countably 
infinite,  and  ( M  —  (7'})  U  T'  is  the  required  decomposition  of  E  as  the  disjoint 
union  of  countably  infinite  sets.  □ 

Corollary.  Let  S  and  E  be  nonempty  sets  with  S  infinite,  and  suppose  that  to 
each  element  s  of  S  is  associated  a  countable  subset  Ex  of  E  in  such  a  way  that 
E  =  Uses  Es.  Then  card  E  <  card  S. 
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PROOF.  The  proposition  allows  us  to  write  S  as  the  disjoint  union  of  countably 
infinite  sets.  If  U  is  one  of  these  sets,  then  Ey  =  [J.  ,  n  Es  is  countable,  being 
the  countable  union  of  countable  sets.  Therefore  there  exists  a  function  from  U 
onto  Ey.  The  union  of  these  functions,  as  U  varies,  yields  a  function  /  from  S 
onto  U  Ey  =  E.  Applying  the  Axiom  of  Choice,  we  can  select,  for  each  e  e  E. 
an  element  .v  e  f~ 1  ({<?} )  and  call  it  g(e).  The  result  is  a  one-one  function  g  from 
E  into  S.  and  consequently  card  E  <  card  S.  □ 

Addition  is  well  defined  for  cardinals:  the  sum  of  two  cardinal  numbers  is 
defined  to  be  the  cardinality  of  the  disjoint  union  of  the  two  sets  in  question.  If 
at  least  one  of  the  two  cardinals  is  infinite,  the  sum  equals  the  larger  of  the  two, 
as  an  immediate  consequence  of  the  above  corollary. 


HINTS  FOR  SOLUTIONS  OF  PROBLEMS 


Chapter  I 


1.  582. 

2.  The  Euclidean  algorithm  gives  11=  1-7  +  4,  7  =  1-4  +  3,  4  =  1-3+1, 
3  =  3  •  1  +  0.  So  the  GCD  is  1.  Reversing  the  steps  gives  1  =  4  —  1  •  3  = 
(11  -  1  -  7)  -  1  •  (7  -  1  -  4)  =  (11  -  1  -  7)  -  1  •  (7  -  1  •  (11  -  1  -  7))  =  2  -  11  -  3-7. 
So  ( x ,  y)  =  (2,  —3)  is  a  solution  in  (a).  For  (b),  the  difference  of  any  two  solutions 
solves  1  lx  +  7y  =  0,  and  the  solutions  of  this  are  of  the  form  (x,  y)  —  nil ,  —  1 1). 

3.  Let  dn  =  GCD(«| . . . . ,  a„).  The  sequence  dn  is  a  monotone  decreasing 
sequence  of  positive  integers,  and  it  must  eventually  be  constant.  This  eventual 
constant  value  is  d,  and  thus  d„  =  d  for  suitably  large  n. 

4.  These  n’s  divide  x  +  y  —  2  and  the  sum  of  2x  —  3y  —  3  and  —2  times  the 
x  +  y  —  2,  hence  x  +  y  —  2  and  —5 y  +  1.  A  necessary  and  sufficient  condition  for 
—5 y  +  1  —  na  to  be  solvable  for  the  pair  (a,  y)  is  that  GCD(5,  n)  =  1  by  Proposition 
1.2c.  Let  us  see  that  the  answer  to  the  problem  is  GCD(5,  n)  =  1 . 

The  n’swe  seek  must  further  divide  5  (x  +  y  —  2)  =  5x + 5y  — 10  and  —  5y  + 1 ,  hence 
also  the  sum5x  — 9,  as  well  as  —  5y+l.  IfGCD(5,  n)  =  I ,  then  5x— 9  =  nb  is  solvable 
for  (b,  x).  With  our  solutions  (a,  y)  and  (b,  x),  we  have  5x  +  5 y  —  10  =  n(b  —  a). 
Since  5  divides  the  left  side  and  GCD(5,  n)  =  1,5  divides  b  —  a.  Write  b  —  a  =  5c. 
Then  x  +  y  —  2  =  nc  and  —  5y  +  1  =  na,  and  we  obtain  2x  —  3y  —  3  =  n(2c  +  a). 

5.  Q(x)  =  (X  -  1  )P(X)  +  (X3  +x2  +  X+  1),  P(X)  =  X(X3  +  x2  +  X  +  1) 
+  (X2+1),  X3+x2+X+l  =  (X+l)(X2+l)+0.  Hence  the  GCD  is  D(  X)  =  X2  +  l. 
For  (b),  we  retrace  the  steps,  letting  R(X)  =  X3  +  X2  +  X  +  1.  We  have  D(X)  = 
P(X)-XR(X )  =  P(X)-X(Q(X)-(X-1)P(X))  =  (X2-X+l  )P(X)-XQ(X). 
Thus  A(X)  —  X2  —  X  +  1  and  B(X)  =  -A. 

6.  The  computation  via  the  Euclidean  algorithm,  done  within  C[A],  retains 
real  numbers  as  coefficients  throughout.  By  Proposition  1.15a  one  GCD  has  real 
coefficients.  By  Proposition  1.15c  any  GCD  is  a  complex  multiple  of  this  polynomial 
with  real  coefficients. 

7.  In  (a),  we  may  assume,  without  loss  of  generality,  that  P  has  leading  coeffi¬ 
cient  1,  so  that  P(X)  —  Xn  +a„-iXn~l  +  ■  ■  -+flo  =  El,  (X  —  Zj)mj  ■  Define  Q(X)  — 

Ylj  (X  -  lj)mP  Then  QiD  =  n,-  (z  -  ij)m>  =  Uj  (z  -  zj)  =  Hz).  Replacing  z 
by  z  gives  Q(z)  —  P(z )  =  zn  +  a„-iz'!_1  H - b  ao  —  z"  +  an~izn~l  H - h  «o- 


615 


616 


Hints  for  Solutions  of  Problems 


Since  P  has  real  coefficients,  Q(z)  =  P(z)  for  all  z.  Then  Q  —  P  has  every  z  as  a 
root  and  in  particular  has  more  than  n  roots.  Hence  it  must  be  the  0  polynomial.  So 
Uj(x--Zjrj  =  n-  (X  —  Zj)mj ,  and  the  result  follows  from  unique  factorization 
(Theorem  1.17). 

In  (b),  the  result  of  (a)  shows  that  we  may  factor  any  real  polynomial  in  C[X]  with 
leading  coefficient  1  in  the  form  Y[Xj  reai (X  -  xj)mJ  UZj  nonreal  ( ( x  ~  Zj)nj  ( X  -  Zj)nJ ) . 

The  right  side  equals  UXj  real (x~Xj)mj  FI.-,  nonreal  (X2  -  (Zj+Zj)X+ZjZj)ni.  Every 
factor  on  the  right  side  is  in  R[X],  and  the  only  way  that  the  polynomial  can  be  prime 
in  M[X]  is  if  only  one  factor  is  present.  Thus  the  polynomial  has  degree  at  most  2. 

8.  For  (a),  let  deg  A  —  d  and  form  the  equation  A(p/q)  —  0.  Multiply  through 
by  qd  in  order  to  clear  fractions.  Every  term  in  the  equation  except  the  leading  term 
has  q  as  a  factor,  and  thus  q  divides  the  leading  term  pd .  Since  GCD( p,  q)  =  1,  no 
prime  can  divide  q.  Thus  q  —  ±1,  and  n  —  p/q  is  an  integer.  Forming  the  equation 
A(n )  =  0,  we  see  that  n  is  a  factor  of  each  term  except  possibly  the  constant  term  a o- 
Thus  n  divides  «o- 

For  (b),  we  apply  (a)  to  both  polynomials.  The  only  possible  rational  roots  of 
X2  —  2  are  ±  1  and  ±2,  while  the  only  possible  rational  roots  of  X 3  +  X2  +  1  are  ±  1 . 
Checking  directly,  we  see  that  none  of  these  possibilities  is  actually  a  root.  By  the 
Factor  Theorem,  neither  X2  —  2  nor  X3  +  X2  +  1  has  a  first-degree  factor  in  Q[X\. 
If  a  polynomial  of  degree  <  3  has  a  nontrivial  factorization,  then  it  has  a  first-degree 
factor.  We  conclude  that  X2  —  2  and  X3  +  X2  +  1  are  prime. 

9.  Computation  gives  GCD(8645,  10465)  =  455.  Therefore  8645/10465  equals 
19/23  in  lowest  terms. 

10.  Apart  from  the  identity,  the  cycle  structures  are  those  of  (1  2)  with  6  represen¬ 
tatives,  (1  2  3)  with  8  representatives,  (1  2  3  4)  with  6  representatives,  and  (1  2)  (3  4) 
with  3  representatives.  This  checks,  since  there  are  4!  =  24  permutations  in  all. 

1 1 .  Check  that  the  function  a  h*  a  ( 1  2)  is  one-one  from  the  set  of  permutations 
of  sign  + 1  onto  the  set  of  permutations  of  sign  —  1 . 

12.  (a)  *3  ^-2^.  (b)  None,  (c)  ^  io/3  ^  +  *3  ^  -2  ^ . 

13.  By  the  definition  of  “step,”  an  interchange  of  two  rows  (type  (i))  takes  n  steps, 
and  a  multiplication  of  a  row  by  a  nonzero  scalar  (type  (ii))  takes  n  steps.  Also, 
replacement  of  a  row  by  the  sum  of  it  and  a  multiple  of  another  row  (type  (iii))  takes 
2 n  steps.  We  proceed  through  the  row-reduction  algorithm  column  by  column.  For 
each  of  the  n  columns,  we  do  possibly  one  operation  of  type  (i)  and  then  possibly  an 
operation  of  type  (ii).  This  much  requires  <  2 n  steps.  Then  we  do  at  most  n  —  1 
operations  of  type  (iii),  requiring  <  2 n(n  —  1)  steps.  Thus  a  single  column  is  handled 
in  <  2/z 6/z  —  1)  +  2 n  —  n2  steps,  and  the  entire  row  reduction  requires  <  2/7 3  steps. 

14.  =(:!!“)■ 

15.  We  induct  on  n,  the  result  being  clear  for  n  —  1.  Taking  into  account  the 
fact  that  B  commutes  with  A,  we  have  (A  +  B)n  —  (A  +  B)(A  +  B)n~l  =  (A  + 
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B)Z’g>(n-ki)An-'-kBk  =  e*=s  rk')An~kBk +  Ti~=irkl)An-i-kBM  = 

ytj>  rkl)An~kBk+zu  i  c=i [)A»-*B‘=A»+Ed  [(VW2=l)M"-***+*B- 

In  turn,  the  right  side  equals  Zk=0  (l)  ^  Bk  by  the  Pascal-triangle  identity  for 

binomial  coefficients. 


16.  Write 


B-  = 


o  o  t 
0  0  0 
0  0  0 


1  1  0 
0  1  1 
0  0  1 


as  I  +  B,  where  B  = 


0  1  0 
0  0  1 
0  0  0 


,  and  apply  Problem  15.  Since 
and  B 3  =  0,  we  obtain  (/  +  B)n  =  I  +  nB  +  \n(n  —  1)B2  = 


1  n  \n{n  —  1) s 
0  1  n 

0  0  1 

17.  ( AD)ij  —  Aijdj  and  (DA),j  —  djAij.  Thus  AD  —  DA  if  and  only  if  d )  —  dj 
for  all  (/',  j  )  for  which  A{j  ^  0. 

18.  Eki  Epq  —  SipEkq. 

19.  Check  that  ^  ^  j  times  the  asserted  inverse  is  the  identity.  Then  the  matrix 

actually  is  the  inverse.  Apply  the  inverse  to  (  y  )  =  (q^j  to  < 


obtain  the  value 


for 


(::)■ 


20.  (a)  No  inverse. 


'  -2/3  -4/3  1 
(b)  A~l  =  (  -2/3  11/3  -2 


(C)  A"1  = 


1  -1  0 
-2  3  -1 

2-5  4 


1  -2  1 

21.  No.  If  the  algorithm  is  followed,  then  the  row  of  0’s  persists  throughout  the 
row  reduction,  at  worst  moving  to  a  different  row  at  various  stages. 

22.  If  C  =  ( AB)~l ,  then  ABC  —  I  shows  that  BC  is  the  inverse  of  A  and 
C  AB  =  I  shows  that  C  A  is  the  inverse  of  B. 

23.  (/  +  A) (7  -  A  +  A2  -  A3  +  •  •  ■  +  ( — A)a'—  1 )  =  I  -  {-A)k  =  I  shows  that 
/  —  A  +  A2  —  A3  +  --  -  +  (—  A)k~l  is  an  inverse. 


24.  Let  S  be  the  set  of  positive  integers,  and  let  / ( n )  =  n  +  1.  Take  g(n)  to  be 
n  —  1  for  n  >  1  and  g(l)  =  1.  Then  g  o  f  is  the  identity.  But  /  is  not  onto  S,  and  g 
is  not  one-one. 

25.  Take  A  =  (l  o)  and  B  =  (q)-  Then  BA  —  ^  More  generally,  if 

A  =  (a  b)  and  B  =  (^),  then  BA=  ( ^  'fh  j •  If  the  upper  right  entry  is  0,  then 
c  —  0  or  b  —  0.  But  then  one  of  the  two  diagonal  entries  must  be  0,  and  hence  BA 
cannot  be  the  identity. 

26.  The  set  of  common  multiples  is  a  nonempty  set  of  positive  integers  because 
ab  is  in  it.  Therefore  it  has  a  least  element. 


27.  This  is  a  restatement  of  Corollary  1.7. 

28.  Let  a  and  b  have  prime  factorizations  a  —  p\'  ■  ■  ■  p}-'  and  b  —  p['  ■  ■  ■  p[r . 
Problem  27  shows  that  any  positive  common  multiple  A  of  a  and  b  is  of  the  form 
p'l'  ■  ■  ■  pT'd"'  •  •  ■  ds‘  whh  nij  >  kj,  m,j  >  //,  and  nj  >  0,  and  certainly  any  positive 
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integer  of  this  form  is  a  common  multiple.  The  inequalities  for  m /  are  equivalent 
with  the  condition  rrij  >  max(kj,  lj).  The  smallest  positive  integer  of  this  kind  has 
nij  =  maxfkj,  lj)  and  nj  —  0.  This  proves  (a).  In  combination  with  the  form  of  N, 
the  formula  for  LCM(«,  b)  proves  (b).  Conclusion  (c)  follows  from  Corollary  1.8 
and  the  identity  kj  +  lj  =  n\m{kj,  lj)  +  ma x(kj,  lj). 

Ic  j  kf  ■ 

29.  If  cij  —  Pi'1  ■  ■  ■  p/’1  is  a  prime  factorization  of  aj,  then  LCM(oq , ,at)  = 

maxK  i<r)kij)  max\<j<r{kr  j]  . 

p  j  ■  ■  ■  pr  ,  just  as  with  Corollary  1 . 1 1 . 

Chapter  II 

1.  The  methods  at  the  end  of  Section  2  lead  to  the  basis  {(|,  1,0),  (—|,0,  1)}  for 
(a)  and  to  the  basis  { ( 1 ,  —  ^ ,  2) }  for  (b). 

2.  ForO  <  k  <  n,  the  two  recursive  formulas  and  one  application  of  associativity 
giveU(A-+l)  +  l><*+2)  =  ( V(k)  +  Vk+l)  +  V(k+2)  =  U(A)  +  (UA+l  +  U(fe+2))  =  U(A)  +  l>(/:+1), 
and  (a)  follows. 

For  (b),  we  proceed  by  induction  on  n,  the  cases  n  <  3  being  handled  by  associa¬ 
tivity.  Suppose  that  the  result  holds  for  sums  of  fewer  than  n  vectors,  with  n  >  4. 
In  a  sum  of  n  vectors,  there  is  some  outer  plus  sign,  and  the  inductive  hypothesis 
means  that  the  sum  is  of  the  form  (tq  +  ■  — |-  14)  +  (14+1  +  •  — b  v„),  the  expressions 
Vi  +  ■  ■  ■  +  14  and  14+1  +  •  •  •  +  vn  being  unambiguous.  The  inductive  hypothesis 
means  that  we  have  tq  +  •  •  •  +  14  =  V(k)  and  14+1  +  ■  ■  ■  +  vn  =  v(k+i> ,  and  hence 
the  expression  we  are  studying  is  of  the  form  V(A)  +  u^+1\  Part  (a)  shows  that  this 
is  independent  of  k,  and  hence  (b)  follows. 

3.  From  Section  1.4,  a  is  a  product  of  transpositions,  and  hence  it  is  enough  to 
prove  the  result  for  a  transposition.  When  r  +  1  <  s,  iteration  of  the  identity  (r  5)  = 
(r  r  +  l)(r  +  1  s)(r  r  +  1)  shows  that  any  transposition  is  a  product  of  trans¬ 
positions  of  the  form  (r  r  +  1),  and  hence  it  is  enough  to  prove  the  formula  for 
a  =  (r  r  +  1).  This  case  is  just  the  commutative  law,  and  the  result  follows. 

4.  (a)  {( 1  2  — 1 ) ,  (0  0  1)};  (b)  j)J;  (c)  2. 

5.  If  R  is  a  reduced  row-echelon  form  of  A,  then  we  know  that  R  —  E  A,  where  E 
is  a  product  of  invertible  elementary  matrices.  Since  A  has  rank  one,  R  has  a  single 
nonzero  row  r  and  is  of  the  form  e  \  r ,  where  e\  is  the  first  standard  basis  vector.  Then 
A  =  E~x R  —  (E~1ei)r,  and  we  can  take  c  =  E~1e  1. 

6.  In  (a),  let  u\ . . ,  us  be  the  rows  of  R  having  at  least  one  of  the  first  r  entries 
nonzero,  and  let  us+ 1, . . . ,  um  be  the  other  rows.  For  each  i  with  1  <  i  <  s,  the 
first  nonzero  entry  of  m,  corresponds  to  a  corner  variable  and  occurs  in  the  j  (7)th 
position  with  j(i)  <  r .  The  most  general  member  of  the  row  space  of  A  is  of  the 
form  ci Mi  +  •  •  •  +  cmum,  and  the  j  (7)th  entry  of  this  is  c,-.  For  this  row  vector  to  be 
in  the  indicated  span,  we  must  have  cj  =  0  for  i  <  s. 
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In  (b),  let  R'  be  a  second  reduced  row-echelon  form,  and  let  its  nonzero  rows  be 
Vi, ,  vm.  From  part  (a),  it  follows  that  the  linear  span  of  us+  \ ,  ... ,  um  equals  the 
linear  span  of  us+i , ,vm  for  each  s.  Moreover,  the  value  of  each  j  ( i )  has  to  be 
the  same  for  w,-  as  for  t>,-.  Inducting  downward,  we  prove  that  m,-  =  t>,-  for  each  i.  For 
i  —  m,  this  follows  since  the  first  nonzero  entry  is  1  for  both  um  and  vm.  Assuming 
the  result  for  5  +  1,  we  write  vs  =  csus  +  cs+imv+i  +  ■  ■  ■  +  cmum.  We  have  cs  —  1 
since  the  first  nonzero  entry  of  us  and  vs  is  1 ,  and  we  have  cv+,  =  0  for  i  >  0  since 
the  j  (.S’  +  0th  entry  of  this  equality  of  row  vectors  is  0  =  cs+j .  Thus  vs  =  us,  and 
the  induction  is  complete. 


7.  Let  E  =  {xi , . . . ,  xn),  and  let  be  a  basis  of  U .  Form  the  matrix 

(./IV,!  Mxn)\ 

:  ■.  :  I .  By  assumption.  A  has  row  rank  n.  Therefore  it  has  column 

fnQc  1)  ■■■  f„(xN)  / 

rank  n,  and  there  exist  n  linearly  independent  columns,  say  columns  jn.  Then 

D  =  {xji , . . . ,  xjn } . 

Let  the  listed  basis  be  F,  and  let  Z  be  the  standard  basis.  Then  ^  ^  j  = 


(  3  \  ) ,  the  inverse  matrix  is  (  1  ) 

o;)(TS)(-5D-(»> 


(“)•  and  (rr)  > 


is  the  product 


9.  One  could  compute  the  matrix  of  I  —  D1  in  an  explicit  basis,  but  an  easier  way 
is  to  observe  that  D3  =  0  and  hence  (/  —  D2)(I  +  D2)  —  I  —  D4  =  I . 


10.  Since  image( AB)  C  image  A,  we  have  rank(AB)  =  dimimage(A5)  < 
dimimage  A  =  rank  A.  Similarly  rank((A5)0  =  rank! /LA')  <  rank/?'.  Since  a 
matrix  and  its  transpose  have  the  same  rank  (by  the  equality  of  row  rank  and  column 
rank),  rank(AZ?)  <  rank/?. 


1 1 .  Since  A  has  n  columns,  rank  A  <  n.  Applying  Problem  10  gives  rank(  AB)  < 
rank  A  <  n.  Since  n  <  k  —  rank  /,  we  cannot  have  AB  =  I. 

12.  Take  A  =  ^  and  B  —  Then  AB  —  B  has  rank  1  while  BA  =  0 

has  rank  0. 


13.  {coshf,  sinhr). 

14.  Let  {v„  |  n  e  Z)  be  a  countably  infinite  basis.  For  each  subset  S  of  Z,  define 
v's  to  be  the  member  of  V'  such  that  v's(yn)  is  1  if  n  is  in  S  and  is  0  if  not.  Choose 
by  Theorem  2.42  a  subset  of  {1^}  that  is  a  basis  for  the  linear  span  of  all  v’s.  Arguing 
by  contradiction,  assume  that  this  basis  is  countable.  Number  the  5”s  in  question  as 
Si,  52, ... .  Any  v's  then  has  a  unique  expansion  as  v's  —  civ'Si  +  -  -  -  +  Ckv'St  for 
some  k.  Fix  k,  and  let  v's  be  expandable  for  this  k.  Let  E  C  { 1,  ...,/)•  Let  m  and  n 

be  such  that  vm  and  vn  are  in  Sj  for  j  in  E  and  are  not  in  Sj  for  j  in  { 1 . /}  —  /:’. 

Then  v's.(vm)  =  v's.(vn)  for  j  —  1, . . . ,  k,  and  hence  v's(vm)  —  v's(vn).  Thus  with 

k  fixed,  the  number  of  S’ s  for  which  v's  is  expandable  is  at  most  2k.  In  particular,  it 
is  finite.  Taking  the  union  over  k ,  we  find  that  there  are  only  countably  many  v's  in 
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the  linear  span  of  vk’  -•••  But  there  are  uncountably  many  subsets  S  of  Z,  and 
we  have  thus  arrived  at  a  contradiction.  We  conclude  that  our  subset  of  all  v's  that  is 
a  basis  for  the  linear  span  must  have  been  uncountable. 

15.  For  (a),  take  L,  M ,  and  N  to  be  the  three  1 -dimensional  subspaces  of  R2 
shown  in  Figure  2. 1 .  Then  L  Pi  (M  +  N)  —  L  while  (L  n  M )  +  (L  Pi  N)  —  0. 

For  (b),  we  always  have  P  since  L  Pi  (M  +  N)  P  LCiM  and  L  Pi  (M  +  N)  P  LC\N. 

For  (c),  if  l  =  m  +  n  is  in  L  Pi  (M  +  N ),  then  L  P  M  implies  that  n  =  l  —  m  is 
in  L.  Sol  —  m  +  n  has  m  e  L  Cl  M  and  n  e  L  fl  N . 

16.  Take  M,  N i,  and  Ni  to  be  the  three  1-dimensional  subspaces  of  V  =  R2 
shown  in  Figure  2. 1 .  Then  M  ®  N\  =  M  ©  N2  —  R2,  but  IV 1  ^  No¬ 
ll .  (b)  only. 

18.  In  V\  ©  ■  ■  •  ©  Vn,  let  pj  pick  off  the  jth  coordinate,  and  let  ij  carry  Vj  to 
(0,  . . . ,  0,  Vj,  0,  . . . ,  0).  Then  pris  is  /  on  Vs  ifr  =  .s  and  is  0  on  if  r  ^  i.  Also, 
EL  1  hPk  =  I  on  Vi  ®  •  •  •  ©  Vn. 

19.  Corollary  2.15  shows  that  dim  kcr  7’  +  dim  image  7'  =  n.  Since  ker  7'  and 
image  T  have  0  intersection,  the  union  of  bases  of  ker  T  and  image  T  is  a  linearly 
independent  set  of  n  vectors  in  M".  This  set  must  be  a  basis  of  R",  and  hence 
R"  =  ker  T  ©  image  T.  This  proves  (a). 

For  (b),  let  T2  =  T  and  suppose  that  v  is  in  ker  T  P 1  image  T .  Since  v  is  in  image  T , 
we  have  v  —  T (w)  for  some  w.  Then  v  =  T (w)  =  T2(w)  —  T(T ( w ))  =  T (v),  and 
the  right  side  is  0  since  v  is  in  ker  T .  Consequently  ker  T  Cl  image  T  —  0. 

20.  Define  L  :  V[  ©  V'2  -r  ( V\  ©  Vf)'  by  L(pt  1,  V2 )  =  Mi(Ti)  +  H 2(^2)- 

21.  Proposition  2.25  shows  that  y  h*  z  is  onto  the  subset  of  z’s  in  V'  such  that 
M  C  ker  z,  i.e.,  is  onto  AnnM.  Since  q  is  onto  V/M ,  _y  1 — ^  z  is  one-one. 

22.  The  kernel  of  q  is  M,  and  thus  the  kernel  of  q  \  N  is  M  Cl  N.  So  q  \  N  is  one-one 
if  and  only  if  M  Cl  N  —  0. 

lfM+fV=  V.thenanyue  V  is  of  the  form  m+n;  so  v  has  v+M  —  m+n  +  M  = 
n  +  M  =  q(n ),  and  q  carries  N  onto  V/M.  Conversely  if  q  carries  N  onto  V/M , 
let  v  e  V  be  given,  and  choose  n  with  q(n )  =  v  +  M.  Then  n  +  M  =  v  +  M,  and 
hence  v  —  n  is  in  M.  This  says  that  V  —  M  +  N. 

Consequently  q\N  :  N  —*■  V/M  is  an  isomorphism  if  and  only  if  M  Cl  N  —  0 
and  M  +  N  =  V,  and  we  know  from  Proposition  2.30  that  this  pair  of  conditions  is 
equivalent  to  the  single  condition  V  —  M  ©  N . 

23.  If  A~l  has  integer  entries,  then  det  A  and  det  A-1  are  integers  that  are  recip¬ 
rocals,  and  we  conclude  that  det  A  —  ±  1 .  If  det  A  =  ±  1,  then  Cramer’s  rule  shows 
that  A-1  has  integer  entries. 

24.  When  r  —  rank  A,  there  exist  r  linearly  independent  rows.  Say  that  these  are 
the  ones  numbered  i\, ...  ,ir.  Let  A 1  be  the  r-by-n  matrix  obtained  by  deleting  the 
remaining  rows.  Since  A\  has  rank  r ,  it  has  r  linearly  independent  columns.  Say 
that  these  are  the  ones  numbered  j\ , ,  jr.  Let  A 2  be  the  r-by-r  matrix  obtained  by 
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deleting  the  remaining  columns.  Then  A 2  is  a  square  matrix  of  rank  r,  is  therefore 
invertible,  and  must  have  nonzero  determinant.  In  the  reverse  direction  if  some 
.s-by-.v  submatrix  has  nonzero  determinant,  then  the  rows  of  the  submatrix  are  linearly 
independent,  and  certainly  the  corresponding  rows  of  A  are  linearly  independent. 
Thus  s  <  rank  A. 

25.  Let  the  expression  in  question  be  f(t)  =  EE  1 a>  eCi ' .  Put  r,-  =  eCi .  The 
numbers  r,-  are  distinct.  The  fact  that  /( 0)  =  /( T)  =  •  •  ■  =  f(n  —  1 )  =  0  says 

that  the  product  of  the  Vandermonde  matrix  formed  from  r\ . r„  times  the  column 

vector  (a\, ...  ,a„)  is  the  0  vector.  Since  the  Vandermonde  matrix  is  invertible,  it 
follows  that  (ai, . . . ,  an)  is  the  0  vector. 

26.  The  characteristic  polynomial  is  A2— 5A+ 6  =  (A— 2)(A— 3).  The  eigenvectors 
for  A  =  2  are  all  nonzero  multiples  of  '  j ,  and  the  eigenvectors  for  A  =  3  are  all 

nonzero  multiples  of  ^  ^ . 

27.  £/( C-*AC)u  =  E,  T,j,k  (C~l)ijAjkCki  -  EM  Ajk  E,-  Cki{C~%  = 

T.j,k  Aik8kj  --  E/  Ajj- 

28.  For  n  =  2,  direct  computation  gives  A2  —  a\ A  —  «o-  Similarly  we  obtain 
A3  —  «2^2  —  a\k  —  at)  when  n  =  3.  We  are  thus  led  to  the  guess  in  the  general  case 
that  the  determinant  is  A"  —  <7„_i  A"-1  —  ■  ■  ■  —  a\X  —  gq.  This  is  proved  by  induction, 
using  expansion  in  cofactors  about  the  first  column.  The  term  from  the  (1,  1)  entry, 

by  the  inductive  hypothesis,  is  A(A',_1  —  an-\Xn~ 2  —  - a\),  and  the  term  from  the 

(1,  n)  entry  is  (— 1)'!+1  (— ao)  det  B ,  where  B  is  a  lower  triangular  matrix  of  size  n  —  1 
with  —1  in  every  diagonal  entry.  Then  det  B  —  (— 1)”_1,  and  substitution  completes 
the  induction. 

29.  In  (a),  wehavedet(A7  —  AB)  =  det(A(AA-1  —  B))  —  det  A  det(AA-1  —  B)  = 
det(AA-1  —  B)det  A  —  det((AA-1  —  B)A)  =  det(A7  —  BA). 

For  (b),  we  know  from  the  fact  that  the  characteristic  polynomial  of  A  is  a  polyno¬ 
mial  that  there  are  only  finitely  many  e  for  which  A  +  el  fails  to  be  invertible.  Thus 
there  is  some  60  >  0  such  that  A  +  el  is  invertible  when  0  <  e  <  eo-  By  (a),  these 
e’s  have  det(A7  —  (A  +  eI)B)  =  det(A7  —  B(A  +  el)).  Since  det  is  a  polynomial  in 
the  entries  of  the  matrix  it  is  applied  to,  det(A7  —  C)  is  a  continuous  function  of  the 
entries  of  C.  Taking  C  =  (A  +  eI)B  and  then  C  =  B( A  +  el)),  and  letting  e  tend 
to  0,  we  obtain  det(A7  —  AB)  —  det(A7  —  BA). 

30.  In  R1,  let  the  «th  spanning  set  consist  of  {(r)  |  0  <  r  <  1/n}.  These  each 
span  R1 ,  but  their  intersection  is  empty  and  the  empty  set  does  not  span  M1 . 

31.  Let  {Uq.}  be  a  basis  of  V.  For  each  a,  define  a  member  v'a  of  V'  by  saying 
that  v'a(vp)  is  1  for  f)  —  a  and  is  0  for  /3  ^  a.  In  addition,  let  wo  be  the  member 
of  V'  that  is  1  on  each  Va.  Arguing  by  contradiction,  suppose  that  wq  is  in  ((V). 
Then  we  can  write  u>o  =  E/s sf  cP  l(vp)  f°r  some  finite  set  F,  and  for  each  a  we 
have  w0(v'a)  =  E,8sF  cP  L(vp)(v'a)  =  E/3€f  cpv'a(vp)-  The  right  side  is  nonzero 
only  if  some  /3  e  F  has  v'a(vp)  ^  0,  i.e.,  only  if  a  is  in  F .  On  the  other  hand,  the 
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left  side  is  1  for  every  a.  For  this  equality  to  happen  for  all  a  forces  F  to  be  infinite, 
contradiction. 

32.  Ann(M  +  N)  C  AnnM,  and  Ann(M  +  N)  c  Ann  A;  thus  Ann(M  +  N)  C 
Ann  M  fi  Ann  N.  If  v'  is  0  on  M  and  is  0  on  N,  then  it  is  0  on  M  +  N.  Hence 
Ann (M  +  N)  2.  Ann  M  fi  Ann  N. 

33.  Ann(M  fi  N)  P  Ann  M,  and  Ann(M  fi  N)  3  Ann  N ;  thus  Ann(M  fi  N)  3 
Ann  M  +  Ann  N.  Let  [ua)  be  a  basis  of  M  fi  N,  let  vp  be  vectors  added  to  {ua}  to 
obtain  a  basis  of  M,  and  let  wY  be  vectors  added  to  )«„}  to  obtain  a  basis  of  N.  Then 
{ua}  U  {n^}  U  {uiy}  is  a  basis  of  M  +  N.  Let  x&  be  vectors  added  to  this  to  obtain  a 
basis  of  V.  If  v'  is  given  in  Ann(M  Pi  N),  define  v\  to  be  v'  on  all  the  basis  vectors 
but  the  vp,  where  it  is  to  be  0,  and  define  v'2  —  v'  —  u'j.  Then  v'  =  v[  +  v'0  with 
v\  e  Ann  M  and  vt,  e  Ann  N.  So  Ann(M  fi  N)  C  Ann  M  +  Ann  N. 

34.  Let  v  be  in  M,  and  let  v'  be  in  AnnM.  Then  i(v)(v')  —  v'(v)  =  0.  This 
proves  (a). 

For  (b).  Propositions  2.19  and  2.20a  give  dim  AnnM  =  dim  V'  —  dimM 
and  dim  Ann(  AnnM)  =  dim  V"  —  dim  AnnM  =  dim  V"  —  (dim  V'  —  dimM)  = 
dimM  =  dim((M).  This  equality  in  the  presence  of  the  inclusion  i(M)  C 
Ann( Ann  M)  implies  <(M)  =  Ann(Ann  M)  by  Corollary  2.4. 

For  (c),  let  V  be  as  in  Problem  31,  and  put  M  =  V.  Then  Ann(M)  =  0  and 
Ann  (Ann  M)  =  V"  /  i(V). 

35.  Parts  (a)  and  (b)  follow  by  writing  out  individual  entries  of  the  products  as 
appropriate  sums. 

36.  If  A  or  D  is  not  invertible,  then  suitable  row  operations  on  the  matrix  on  the  left 
side  exhibit  the  matrix  on  the  left  as  not  invertible,  and  hence  both  sides  are  0.  Thus 
we  may  assume  that  A~l  and  D~l  exist.  Problem  35c  allows  us  to  decompose  the 

given  matrix  as  =  (gy)  (od)  (o^/^)'  determinant  of  the  product 

is  the  product  of  the  determinants.  Using  the  defining  formula  for  det,  we  see  that  the 
first  two  determinants  from  the  right  side  are  det  A  and  det  D.  The  third  determinant 
is  1  since  the  matrix  is  triangular  with  1  ’s  on  the  diagonal. 

37.  In  effect,  we  do  row  reduction  with  blocks,  taking  advantage  of  Problem 

35c.  We  have  (Ar  B)  =  0")  A~'BI  Y  Tak- 

\C  D  )  \0lJ\c  D  )  \0  I  J  \C  I  J  \0  D-CA-'B  J 

ing  the  determinant  of  both  sides  and  using  Problem  36,  we  obtain  det 

(det  A)  det(D  —  C  A-1  B)  =  det( AD  —  AC  A~l  B),  and  this  equals  det(A£>  —  C B) 
since  AC  —  C  A. 

38.  The  matrices  ^  j  and  (bo)  are  of  size  n-by-n,  and  their  products  in 

the  two  orders  are  ^  j  and  BA.  Problem  29  shows  that  det  ^ XIn  —  ^  j  j 

=  det(/,/„  —  BA).  The  left  side  equals  X"~k  det (14  —  AB),  and  the  result  follows. 

39.  Substitute  the  definitions  of  the  determinants  of  A(S)  and  A(S)  into  the  right 
side,  sort  out  the  signs,  and  verify  that  the  result  is  the  defining  expression  for  det  A. 
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40.  Expansion  in  cofactors  about  the  last  row  gives  det  A„  —  ( A„)nn  det  (A„)nn  — 

(An)n- i  n  det  =  2det  An-\  +  det  B,  where  B  in  block  form  is  the  square 

matrix  of  size  n  —  1  given  by  B  —  ^  A"~2  _(|l  j .  Expansion  by  cofactors  of  det  B  about 
the  last  row  shows  that  det  B  —  —  det  A„_ 2,  and  the  stated  formula  results. 

41.  Inspection  gives  det  A]  =  2  and  det  A2  =  3.  The  function  /  with  fin)  = 
det  A„  —  (77  +  1)  thus  has  /(l)  =  /( 2)  =  0  and  f(n)  =  2f(n  —  1)  —  fin  —  2)  for 
77  >  3,  and  it  must  be  0  for  all  n  >  1. 

42.  The  only  changes  in  (a)  are  notational.  For  (b),  we  compute  det  C2  =  detC3  = 
2,  and  the  formula  det  C„  =  2  follows  as  in  Problem  41. 

43.  For  (b),  we  interchange  the  first  two  rows  and  then  interchange  the  first  two 
columns.  The  determinant  does  not  change. 

44.  For  (b),  we  interchange  the  third  and  fourth  rows  and  then  interchange  the 
third  and  fourth  columns.  For  (c)  we  change  the  list  of  rows  and  columns  of  A„  from 
1,2,  3,4,  5  to  3,  5,4,2,  1. 

45.  The  area  of  the  rectangle  is  ( a  +  b)(c  +  cl),  the  two  trapezoids  have  areas 
\cl(a  +  (fl  +  b))  and  \a(d  +  (c  +  d)),  and  the  two  triangles  have  areas  5 ac  and  \bd. 
The  difference  is  be  —  ad.  The  answer  is  independent  of  the  picture  except  for  a  sign. 
Thus  the  answer  is  the  absolute  value  of  the  determinant. 

46.  The  geometric  effect  is  to  leave  the  left  edge  where  it  is  and  to  translate  the 
right  edge  parallel  to  itself  in  the  same  direction.  The  area  is  unchanged  because  the 
parallelogram  can  still  be  regarded  as  having  base  from  j  to  ^  ^  and  having  the 
same  distance  between  the  parallel  sides.  The  algebraic  effect  is  that  of  the  column 
operation  of  replacing  the  second  column  by  the  sum  of  it  and  s  times  the  first  column. 

47.  Right  multiplication  by  ^  j  ®  leaves  the  bottom  edge  where  it  is  and  translates 
the  top  edge  parallel  to  itself  in  the  same  direction;  algebraically  it  corresponds  to 
the  column  operation  of  replacing  the  first  column  by  the  sum  of  it  and  t  times  the 
second  column.  Right  multiplication  by  ^  ^  ^  j  interchanges  the  left  and  bottom  sides 
of  the  parallelogram  and  corresponds  to  interchange  of  the  two  columns  of  the  matrix. 
Right  multiplication  by  ^  ^  ®  ^  corresponds  to  stretching  the  left  side  by  a  factor  of  q 
if  q  >0,  along  with  reversing  the  direction  if  q  <  0,  and  the  algebraic  effect  is  the 
column  operation  of  multiplying  the  first  column  by  q .  The  effect  of  ^  ^  is  similar 
but  affects  the  bottom  edge  instead  of  the  left  edge. 

48.  The  roles  of  rows  and  columns  are  interchanged  by  the  transpose  operation, 
and  the  determinant  is  unaffected  by  transpose  according  to  Proposition  2.35.  In 
view  of  Proposition  1.29,  A  can  thus  be  put  in  reduced  column-echelon  form  by  a 
sequence  of  column  operations,  each  of  which  corresponds  to  right  multiplication 
by  a  suitable  elementary  matrix.  The  result  is  an  equality  saying  that  the  product 
of  A  and  some  elementary  matrices  is  the  identity.  Using  inverses  shows  that  A  is 
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the  product  of  elementary  matrices.  The  product  can  be  applied  a  step  at  a  time  to 
the  cube  determined  by  the  standard  basis,  and  each  step  either  preserves  volume  or 
multiplies  it  by  a  known  factor,  up  to  a  minus  sign.  The  product  of  these  numerical 
factors  is  the  determinant,  up  to  a  minus  sign.  Hence  the  volume  of  the  parallelepiped 
has  to  be  the  product  of  these  factors,  with  its  sign  made  positive. 


Chapter  III 


1.  Since  Tr7?*A  =  JT  •  AjjBjj,  the  inner  product  is  the  usual  inner  product  on  the 
n2  entries.  Then  (a)  and  (b)  are  immediate.  For  (c),  (a)  gives  the  result  of  Parseval’s 
equality  relative  to  the  orthonormal  basis  in  (b). 

For  (d),  let  U  be  the  unitary  matrix  with  columns  ui, ...  ,un,  i.e.,  the  matrix 
(  sr  ) '  w^ere  r  =  (M l >  •••)*<«)  and  £  is  the  standard  ordered  basis.  Then  ||  A ||^s  = 
Tr(A*A)  =  Tr (U~lA*AU)  =  .  |(A£/)y|2  =  £./  ||  AUj  ||2,  and  this  equals 

JT  j  \v*Auj\2  by  Parseval’s  equality. 

In  (e),  W2-  consists  of  all  matrices  that  are  0  along  the  diagonal.  It  has  dimension 

2 

n  —  n. 


2.  The  system  has  unknowns  co,  c\ ,  . . . ,  c„ ,  where  pn  (x)  =  co  +  c\x  H - 1-  cnxn , 

and  the  kth  equation,  for  0  <  k  <  n,  comes  from  the  equality  for  f(x)  =  xk ,  namely 

2~k  =  J2'j=oU  +  k  +  1 )_lc; • 

3.  (. LM )*  =  M*L*  =  ML  is  equal  to  LM  if  and  only  if  LM  =  ML. 

4.  A  vector  u  is  in  ker  L  if  and  only  if  ( L(u),  v)  =  0  for  all  v,  if  and  only  if 
(m,  L*(v ))  =  0  for  all  t>,  if  and  only  if  u  is  in  (image  L*)1. 

5.  There  are  none.  The  characteristic  polynomial  has  no  real  roots,  but  all  roots 
must  be  real  if  A  is  Hermitian. 

6.  The  map  v\  i->-  (L(i>i),  1)2)2  is  a  linear  functional  on  V\  and  hence  is  given  by 
the  inner  product  with  a  unique  member  u  \  of  Pi, i.e.,  ((L(ui),  1)2)2  =  (ui,  u  1)1, and 
we  define  this  element  u\  to  be  L*(v 2).  We  readily  check  that  L*  is  linear,  and  (a)  is 
then  proved.  The  proof  of  (b)  proceeds  in  the  same  way  as  in  the  case  that  V\  —  lA- 

7.  In  (a),  if  v  is  in  S2-  Pi  T 2~,  then  v  is  in  V2-  —  0.  Thus  S2-  +  T2-  is  a  direct 
sum.  We  have  dim  V  —  dim  S  +  dim  T  —  (dim  V  —  dim  S’-1)  +  (dim  V  —  dim  T2-)  = 
2  dim  V  —  dim  .S'1  —  dim  7’1.  Therefore  dim  V  —  dim  (.S'1  +  7’1).  The  inclusion 
plus  the  equality  of  the  finite  dimensions  forces  V  —  S1  +  7’1. 

In  (b),  let  X  be  0  or  1.  Then  E*u  =  Xu  if  and  only  if  (. E*u ,  v)  —  X(u,  v)  for  all 
v,  if  and  only  if  ( u ,  Ev)  =  X ( it ,  v)  for  all  v.  When  X  =  1,  this  says  that  E*u  —  u  if 
and  only  if  u  _L  (7  —  E)v  for  all  v,  hence  if  and  only  if  u  _L  7’,  hence  if  and  only  if  u 
is  in  T2-.  When  X  =  0,  it  says  that  E*u  —  0  if  and  only  if  u  _L  Ev  for  all  v.  hence  if 
and  only  if  u  _L  S,  hence  if  and  only  if  u  is  in  S1. 
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8.  The  formulas  of  the  Gram-Schmidt  orthogonalization  process  have  Vj  = 
Cj(gej)  +  J2i<j  aUv‘  with  ci  >  Therefore  gej  =  cjlVj  +  Yi<j  b<jvi >  and 

(k~lg)ij  =  T,i  (k~l)ugij  =  ( k~l)u(gej)i 

=  Cj'  Yl  +  Yl  T,m<j  C k~l)ilbmj(vm)l 

—  Cj  ( k  Vj  )j  +  Ym<jbmj(k  V/n  ) i  —  Cj  8ij  T  Y.™^j  k»t  i 8/ m  • 

If  i  —  j,  the  right  side  is  cj 1  and  is  positive.  If  i  >  j,  then  every  term  on  the  right 
side  is  0.  Thus  k~ 1  g  is  upper  triangular  with  positive  diagonal  entries.  Since  k  carries 
the  standard  orthonormal  basis  to  the  orthonormal  basis  {r>i, . . . ,  vn  | ,  k  is  unitary. 

9.  For  (a),  the  Spectral  Theorem  and  Corollary  3.22  show  that  A  is  similar  to  a 

diagonal  matrix  with  positive  diagonal  entries.  Thus  det  A  >  0.  In  (b),  we  specialize 
the  inequality  x*  Ax  >  0  to  x ’s  that  are  0  except  in  the  entries  numbered  i\ ,  and 

we  find  that  the  submatrix  is  positive  definite.  Then  the  result  follows  from  Corollary 
3.22. 

10.  Take  g  —  \f~A  in  Problem  8  and  obtain  -J~A  —  kt  with  k  unitary  and  t  upper 
triangular  with  positive  diagonal  entries.  Then  A  =  (VA)*(VA)  =  (kt)*(kt)  —  t*t. 

1 1 .  The  roots  of  the  characteristic  polynomial  are  j  ( a+d+s )  and  \  ( a+d—s ) ,  where 

s  =  -  d)2  +  4\b\2.  Let  r  =  k(-a  +  d  +  s ).  Then  D  =  (  5(fl+fl'+s)  i(aH°d_5)) 

and  U  —  ib2  +  r2)~1'2  ^Y'j^- 

12.  In  (a),  the  conditions  ad  —  \b\ 2  >  0  and  a  +  d  >  0  together  are  necessary 

and  sufficient.  In  (b),  let  *J~D  =  /  ^/~M+d+^)  — o - \  and  let  U  be  as  in  the 

\  0  y /\{a+d-s) ) 

previous  problem.  Then  the  positive  definite  square  root  of  A  is  U \[T)  U~l. 

13.  The  Spectral  Theorem  shows  that  A  has  a  basis  of  eigenvectors,  each  with  a 
real  eigenvalue.  If  v  is  an  eigenvector  with  eigenvalue  X,  then  r'  A  v  =  0  says  that 
X  ||  v  || 2  =  0.  So  every  eigenvalue  is  0,  and  A,  being  similar  to  a  diagonal  matrix,  has 
to  be  0. 

14.  Choosing  a  basis  of  eigenvectors,  we  may  solve  the  corresponding  problem 
for  diagonal  matrices.  Thus  let  A  be  a  diagonal  matrix,  and  assume,  without  loss 
of  generality,  that  An  =  •  •  •  =  Am  —  1  and  Ajj  ^  1  for  j  >  k.  Then  the  given 
equation  (/  —  A)2y  =  (/  —  A)x  says  that  (1  —  Ajj)2yj  =  (1  —  Ajj)xj  for  all  j .  Thus 
define  v/  to  be  0  if  j  <  k,  and  choose  yj  =  (1  —  AjjT  lxj  for  j  >  k. 

15.  LL*  =  ( UP)(UP )*  =  ( PU)(PU)*  =  PUU*P  =  P2  =  PU*UP  = 
(UP)*  (UP)  =  L*L. 

16.  The  family  has  a  basis  of  simultaneous  eigenvectors,  and  the  matrices  are  all 
diagonal  in  this  basis.  So  the  answer  is  the  dimension  of  the  vector  space  of  diagonal 
matrices,  namely  n. 
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17.  In  (a),  c'G(i>i, . . . ,  vn)c  =  J2ij  ci (»/>  vj)cj  =  (J2i  Wi,  J2j  cjvj)  = 

1 1 c i n i  +  ■  ■  ■  +  cnu„||2.  Thus  Corollary  3.22  shows  that  G(iq,  . . . ,  vn)  is  positive 
semidefinite.  Moreover,  ||ciUi  +  ■  •  •  +  c;ii>,j||2  =  0  for  some  c  ^  0  if  and  only 
if  Vi, . . . ,  v„  are  linearly  dependent.  Thus  G ( v i ,  . . . ,  v„)  is  definite  if  and  only  if 
i’i ,  }  vn  are  linearly  independent.  We  know  that  a  positive  semidefinite  matrix  is 

definite  if  and  only  if  it  is  invertible,  and  thus  detG(i>i, . . . ,  v„)  >  0  if  and  only 
if  Vi , . . . ,  vn  are  linearly  independent;  this  proves  (b).  In  (c),  equality  holds  in  the 
Schwarz  inequality  if  and  only  if  the  two  vectors  are  linearly  dependent,  i.e.,  if  and 
only  if  one  of  them  is  a  multiple  of  the  other. 

18.  This  is  immediate  by  induction. 

19.  For  (a),  the  left  side  is  D2(Xn+l)  =  (n  +  1  )D(XnX').  Comparing  with  the 
expected  right  side,  we  see  that  we  are  to  show  that 


nD{XnX')  =  (2/7  +  l)nX"Xn  +4n2Xn~l. 


The  left  side  equals  nXn  1  times  n(X')2  +  XX” ,  while  the  right  side  equals 
//X"-1  times  (2 n  +  \)X"X  +  An.  Since 


n{X')2  +  XX"  =  Anx2  +  2x2  -2 

—  (4/7  T  2)x~  —  (An  T  2)  T  4/7  =  (2/7  T  1)X  ’ X  A~  4/7, 


(a)  is  proved. 

For  (b),  the  Leibniz  rule  gives  D'l(X'Y)  =  X'D"Y  +  nX" Dn~xY  for  any  Y . 
Meanwhile,  application  of  Dn~ 1  to  (a)  yields 

D',+i(X"+1)  =  (2 //  +  1)D"(X,X")  —  n(2/7  +  1)X"D"_1(X")  —  4//2Z)”_1(X"_1). 

Substituting  with  Y  =  (2/7  +  1)X",  we  obtain  (b).  The  recursion  in  conclusion  (c) 
follows  immediately  by  multiplying  by  (2"+1/z!)-1. 

For  (d),  conclusion  (c)  and  the  definition  of  Pn  show  that  Qn  —  Pn  —  Rn  satisfies 
Qo  =  Q l  =  0zn&(n+\)Qn+x(x)-(2n+Y)xQn(x)-nQn-i(x).  Thus  Qn(x)  =0 
for  every  //  by  induction. 

20-21.  Write  X  =  x2  —  1.  Since  X”  =  (x  —  1  )"f(x),  the  function  X"  has  all 
derivatives  through  order  n  —  1  equal  to  0  at  x  =  1 .  The  same  conclusion  applies 
also  at  x  —  —  1.  If  m  <  n,  integration  by  parts  gives 

f\  Dm(X'n)Dn(Xn)dx  =  [Dm(Xm)Dn~l(Xn)]l_l-f\  Dm+l(Xm)Dn-'(Xn)dx 
=  -f\  Dm+'(Xm)Dn-'(Xn)dx 
=  ■  ■  ■  =  (-l)A7-i  Dm+k(Xm)Dn~k(Xn)dx 
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for  k  <  n.  If  m  <  n,  then  taking  k  —  m  +  1  gives  0  on  the  right  side  because 
D2m+\Xm)  =  0.  If  m  =  n,  then  taking  k  =  n  gives  (-1)"  XnDln(Xn)dx  = 
(— 1)"(2 n)\  j  Xn  dx  on  the  right  side.  Therefore 


(■ Dn(Xn ),  D"(Xn))  =  (-ir(2n)!(-ir^§ 


2(2"«!)2 
2h  +  1  ’ 


and  (P„,  P„>  =  2^r- 

22.  The  expansion  for  (a)  is 

£»"+1[(£»(X"))X] 

=  Dn+X  (D(Xn))X  +  (n  +  1) Dn(D(Xn))X'+  \n(n  +  1  )Dn~l  (D(Xn))X" 
=  XD2(Dn(Xn ))  +  (n  +  1  )X'D{Dn{Xn))  +  2«(n  +  1  )X"D'\Xn), 


and  the  expansion  for  (b)  is 

Dn+l[{D(Xn))X]  =  Dn+x(nXnX')  =  Dn+x(nXn)X'  +  (n  +  1  )D"(nX")X" 
=  nD(Dn(Xn))X'  +  n(n  +  1  )D’\Xn)X" . 


Thus,  for  (c),  we  get  (x2  —  1  )D2(Pn(x))  +  (n+  1)2 xD{Pn(x))  +  \n{n  +  l)2P„(x)  = 
nD{Pn(x))2x  +  n{n  +  l)P„(x)2.  This  simplifies  to 

(x2  -  I ) P[[  +  2(n  +  1  )xP’n  +  n(n  +  1  )P„  —  2 nxP'n  +  2 n(n  +  1  )Pn 

and  then  to  (1  —  x2)P —  2 xP'n  +  n(n  +  1  )P„  —  0. 

24.  In  Problems  24-28,  there  is  no  difficulty  with  addition,  and  we  have  to  check 
something  only  about  scalar  multiplication.  For  Problem  24,  we  need  to  check  in  V 
that  ( ab)v  —  a{bv),  lu  =  v,  a(u  +  v)  =  au  +  av ,  and  (a  +  b)v  —  av  +  bv.  These 
are  satisfied  in  V  because  the  identities  ( ab)v  =  a(bv ),  Iv  —  v,  d(u  +  v)  =  au  +  av, 
and  (a  +  b)v  —  av  +  bv  hold  in  V. 

25.  We  are  to  see  that  L  respects  scalar  multiplication,  and  the  argument  is  that 
L(cv)  =  L(cv)  —  cL(v )  =  cL(v). 

26.  We  have  (au,  bv)y  —  (bv,  du)v  —  ab(v,  u)y  —  ab(u,  v)y,  as  required. 

27.  Let  i  in  V'  correspond  to  v  in  V,  so  that  l(u)  —  ( u ,  v)y  —  ( v ,  u)y.  Then 
l  in  V'  corresponds  to  v  in  V,  while  ( ci)(u )  =  c(v,  u)y  —  (cv,  u)y  shows  that  cl 
corresponds  to  cv  in  V . 

28.  Let  l  in  V'  correspond  to  v  in  V .  Then  L'(l)(u)  —  l(L(u))  =  (v,  L(u))y  = 
(v,  L(u))y  —  ((L)*(v),  u)y,  and  this  says  that  L1  (l)  corresponds  to  ( L)*(v ),  i.e.,  L’ 
corresponds  to  (L)*. 
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Hints  for  Solutions  of  Problems 


29.  In  (a),  it  is  enough  to  check  the  result  for  p  and  q  equal  to  monomials,  and 
(b)  is  a  direct  calculation.  In  (c),  let  p{x)  =  YZ  Ck\....,knx\'  •  •  ■  Xn  ■  The  bilinearity 

and  (b)  show  that  {p,  p)  =  . k„)2{x\'  •••**",  x\'  ■■■  x*"),  and  this  is  positive 

unless  all  the  coefficients  are  0. 

30.  The  polynomial  p  is  in  Hn  if  and  only  if  d(\x\2)p  —  0,  if  and  only  if 
(d(\x\2)p.  q)  —  0  for  all  q  in  Vn-2,  if  and  only  if  3(z/)(3(|jc|2)jc>)  =  0  for  all  q  in 
Va,_2,  if  and  only  if  d(\x\2q)p  —  0  for  all  q  in  V/v-2,  if  and  only  if  (p,  \x\2q)  —  0 
for  all  q  in  Vn-2,  if  and  only  if  p  is  in  (|x|2yjv-2)“L» 

31.  Problem  30  gives  Vn  —  Hn  ©  \x\2Vn-2,  and  we  iterate  this  decomposition. 

32.  A  basis  of  \x\2V2  is  [\x\2x2,  \x\2.xiX2,  |x|2.r2}.  Apply  the  Gram-Schmidt 

orthogonalization  procedure  to  obtain  an  orthonormal  basis  [\x\2u\,  \x\2U2.  | x | 2 z/ 3 } , 
and  write  xf  +  =  h4  +  YZq—\  (xi  +  Vp  \x\2Uj)\x\2uj.  Then  /?4  is  harmonic  by 

Problem  30.  A  basis  of  |.r|2Vo  is  \x\2,  and  hence  an  orthonormal  basis  consists  of 
the  single  vector  w  =  |||x|2||-1|x|2.  Write  uj  —  h 2j  +  ( Uj ,  w)w  for  each  j,  and 
substitute.  Each  li2j  is  harmonic.  Then  we  have 

x\  +  yt  =  h 4  +  E/=i  (x\  +  yA\ .  \x\2Uj)\x\2(h2,j  +  ( Uj ,  w)w) 

=  h4  +  \x\2  Y2j=i  (x\  +  yt’  \x\2Uj)h2,j 
+  IXI4E./=1  ( xf  +  Vp  \x\2Uj)(uj,  U))|||.r|2|r1 

with  h4  in  H4,  each  h 2.  /  in  Hi  -  and  the  last  sum  in  Hq. 

33.  Let  P  be  the  positive  semidefinite  square  root  of  B.  Then  AB  —  AP P,  and 
hence  detl/. /  —  AB)  —  det(/,/  —  PAP).  Consequently  AB  has  the  same  eigenvalues 
as  PAP.  The  latter  is  positive  semidefinite  since  ( P APv ,  v)  —  ( A(Pv ),  Pv)  >  0. 
Therefore  all  the  eigenvalues  of  AB  are  >  0. 

34.  Since  (P-1  ABCP~1v,v)  =  (ABC(P~lv),  P-1v),  ABC  is  positive 
semidefinite  if  and  only  if  P~x  ABC P~x  is  positive  semidefinite,  if  and  only  if 
P-1  ABC P-1  has  all  eigenvalues  >  0.  But  P-1  ABC P-1  has  the  same  eigenvalues 
as  APCP_1P“’  =  AB,  which  has  all  eigenvalues  >  0  by  the  previous  problem. 


Chapter  IV 

1.  If  a2  —  b2  =  ( ab )2  =  1,  then  a~x  —  a,  b~l  =  b,  and  ( ab)~x  =  ab.  So 
ab  —  (ab)~x  =  b~xa~l  —  ba. 

2.  Number  the  vertices  counterclockwise  as  1 ,  2,  3,  4.  The  motions  in  D4  are  then 
given  by  permutations  as  1,  (1  2)(3  4),  (1  4) (2  3),  (1  3),  (2  4),  (1  2  3  4), 
(1  3) (2  4),  (1  4  3  2). 

2A.  In  (c),  the  result  follows  from  (a)  and  (b)  if  r  ^  0.  If  r  —  0,  both  sides  are  1. 
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3.  Choose  integers  x  and  y  with  xl  +  y\G\  —  1.  Then  a  —  ax,+y G  = 
(al)x {a}G][y  —  (a1)*  since  a'G''  —  1,  and  this  is  a  power  of  an  element  of  H. 

4.  Define  <p  :  G  — »■  G’  by  q>(a)  —  a.  Then  tp(a )  o  tp{b)  —  a  o  b  =  ba  —  tpiba)  — 
tp(aob).  From  this  equality  it  follows  that  G'  is  a  group  and  that  q>  is  an  isomorphism. 

5.  For  n  >  0,  ( ab)n  =  abab  ■  ■  ■  ab  =  a"b"\  also  ( ab)~n  =  ((ab)~l)n  — 
( b~xct-l)n  =  (a-'b~l)n  =  {a~x)n{b~l)n  =  a~nb~n.  In  63,  a  h*  a2  is  not  a 
homomorphism  since  four  elements  are  sent  to  1  and  since  4  does  not  divide  1 63 1  =  6. 

6.  Define  <p  :  H  x  K  — »•  HK  by  <p(h,  k)  —  hk.  What  needs  proof  is  that 
members  of  H  commute  with  members  of  K.  If  h  is  in  H  and  k  is  in  K,  then 
( hkh~l)h  =  hk  =  k(k~1hk).  Since  H  and  K  are  normal,  hkh~l  is  in  K  and 
k~lhk  is  in  H.  Then  k~l  (hkh~l)  =  ( k~xhk)h~l  and  H  Cl  K  —  {1}  together  imply 
k~l  (hkh~l)  —  1  =  {k~lhk)h~l  =  1.  From  the  first  of  these,  A'  =  hkh~x .  Therefore 
hk  —  kh. 

7.  Since  GCD(1234,  8191)  =  1,  there  exist  x  and  y  with  1234.x  +  8191y  =  1, 
and  x  and  y  can  be  found  explicitly  by  the  Euclidean  algorithm  of  Section  1.1.  For 
this.*,  1234.x:  =  1  mod  8191. 

8.  The  members  1,  2,  . . . ,  p  —  1  of  Fp  are  roots  of  Xp~l  —  1=0.  By  iterated  use 
of  the  Factor  Theorem,  Xp~l  —  l  =  (X  —  1)(X  —  2)  •  •  •  (X  —  (p  —  1))<2(2Q,  and 
Q(X)  must  have  degree  0.  Checking  the  coefficient  of  Xp~ 1  on  both  sides  shows 
that  Q{X)  —  1.  Evaluating  at  X  —  0  gives  — 1  =  ( —  1 ) ( — 2)  •  •  ■  (—{p  —  1))  mod  p. 
Since  p  is  odd,  this  equation  reads  (p  —  1)!  =  —1  mod  p. 

9.  Corollary  4.39  shows  that  such  a  group  has  to  be  abelian,  and  Theorem  4.56 
shows  that  it  is  the  direct  sum  of  cyclic  groups.  Thus  it  must  be  Cpi  or  Cp  x  C;),  up 
to  isomorphism. 

10.  If  v  =  axa~l ,  then  yn  —  axna~x .  This  proves  (a).  Also,  ba  —  a~x(ab)a 
shows  that  ba  and  ab  are  conjugate.  This  proves  (b). 

11.  There  are  four  classes:  Ci  =  {1},C2  =  {(1  2) (3  4),  (1  3)  (2  4),  (1  4) (2  3)}, 
C 3  =  {(1  2  3),  (3  4  1),  (2  1  4),  (4  3  2)},C4  =  {(1  3  2),  (3  1  4),  (2  4  1),  (4  2  3)}. 
The  centralizer  of  the  first  element  of  each  class  is  24  for  C \ ,  C\  U  C?  for  C2, 
{(1  2  3),  (1  3  2)}  for  C3  and  C4.  Since  24  has  no  element  of  order  6,  it  has  no 
subgroup  C 6.  In  a  subgroup  ©3,  an  element  of  order  3  is  conjugate  to  its  square,  but 
no  element  of  order  3  in  24  is  conjugate  to  its  square. 

12.  A  subgroup  of  order  30  would  have  index  2  and  would  thus  be  normal,  in 
contradiction  to  Theorem  4.47. 

13.  This  is  a  special  case  of  Proposition  4.36. 

14.  Since  H  is  normal,  G  acts  on  H  by  conjugation.  The  number  of  elements  in 
an  orbit  has  to  be  a  divisor  of  |  G  \ ,  and  the  smallest  divisor  of  |  G  \  apart  from  1  is  p,  by 
hypothesis.  Since  { 1 }  is  one  orbit  and  there  are  only  p  —  l  other  elements  in  //,  each 
orbit  must  contain  one  element.  Therefore  ghg~l  —  h  for  each  g  e  G  and  h  e  H, 
and  each  h  is  in  Zq. 
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Hints  for  Solutions  of  Problems 


15.  Certainly  the  inner  automorphisms  are  closed  under  composition  and  inversion 
and  therefore  form  a  subgroup.  If  tp  is  an  automorphism  and  \j/  is  the  inner  automor¬ 
phism  ^(x)  =  axa~l ,then<poiJ/o(p~l  (x)  =  tp(atp~^  (x)a~l)  —  (p(a)xcp(a)~l  shows 
that  (p  o  t[r  o  (p~l  is  inner.  Hence  the  subgroup  of  inner  automorphisms  is  normal. 
Define  a  mapping  of  G  into  the  inner  automorphisms  by  O(a)  =  {.r  i->-  a.ra-1}- 
Then  <P(ab)  —  <4>(a)<t>(b),  and  hence  <L  is  a  homomorphism.  Certainly  <t>  is  onto  the 
inner  automorphisms,  and  its  kernel  consists  of  all  elements  a  e  G  with  axa~l  —  x 
for  all  x,  hence  consists  of  all  a  in  Zq.  Thus  $  exhibits  G/Zq  as  isomorphic  to  the 
group  of  inner  automorphisms. 

16.  Part  (a)  is  proved  in  the  same  way  as  Lemma  4.45.  For  (b),  choose  m  —  8; 
then  Aut  C„,  is  C2  x  Ci. 

17.  In  (a),  each  C*  is  a  conjugacy  class,  by  Proposition  4.42,  and  it  is  evident  that 
the  Cy  s  are  the  only  conjugacy  classes  whose  members  have  order  2.  If  x  and  y  are 
in  then  r (xyx~l)  =  r(x)r(y)r(x)~l  shows  that  r  carries  any  conjugate  of  y  to 
a  conjugate  of  r(y).  Therefore  conjugacy  classes  map  to  conjugacy  classes  under  r, 
and  r(Ci)  has  to  be  some  Ck . 

In  (b),  the  number  of  ways  of  selecting  2k  elements  from  n  is  (^n).  For  each 
of  these,  the  number  of  ways  of  selecting  k  unordered  pairs  of  elements  from  2k 
elements  is  the  multinomial  coefficient  (0  2k  2)  =  .  Although  the  individual  pairs 

are  unordered,  this  enumeration  counts  one  for  each  different  ordering  of  the  k  pairs. 
There  are  kl  orderings,  and  hence  the  multinomial  coefficient  must  be  divided  by  k\ 
to  discount  the  enumeration  of  the  pairs.  Thus  |  C*  |  is  the  product  of  the  integer  (2J,.) 
and  the  integer  . 

In  (c),  we  saw  in  (b)  that  Nk  —  is  always  an  integer.  Let  us  bound  it  below. 
Canceling  every  even  factor  of  the  numerator  by  a  factor  of  k  \  and  a  factor  of  2k ,  we 
see  that  Nk  =  (2k  —  1)(2A  —  3)(2A  —  5)  •  •  •  (3) (1).  Thus  Np  >  2k  —  1  with  equality 
only  if  2k  —  1  =  1,  in  which  case  k  —  1.  Also,  Nk  >  (2k  —  1)  (2 k  —  3)  with  equality 
holding  for  a  value  of  k  >  1  only  if  2k  —  3  =  1 ,  in  which  case  k  =  2. 

Now  let  us  compare  \Ck\  and  |Ci|.  We  have  N\  =  1.  Also,  \Ck\  =  (2^)2^  = 
Nk("k)  and  |Ci|  =  Ni(2)  =  (").  The  easy  comparison  is  that  | C* |  >  ("A.j  and  this 
is  >  (2)  =  | Ci  |  unless  k  —  1  or  \n  —  2k\  <  2.  Thus  \Ck\  >  |Ci|  unless  k  equals  1 
or  \jti  or  \(n  —  1)  or  \(n  —  2).  We  can  discard  k  —  \(n  —  2)  because  in  this  case 
IQ|  =  >  All Q)  =  |Ci|  except  when k  —  1. 

Consider  A  =  \(n  —  1)  with  A  >  1.  Then  |Ci|  =  \n(n  —  1)  =  nk  and  \Cp\  = 
Nk(n"_  1)  =  nNk-  From  above,  the  latter  is  >  n(2k  —  1)  >  nk  =  |Ci|. 

Finally  consider  A  =  \n  with  A  >  1.  Then  |Ci|  =  \n(n  —  1)  =  (n  —  1)A  and 
\Ck\  =  Nk(")  —  Nk.  From  above,  the  latter  for  A  >  1  is  >  (2k  —  1)(2A  —  3)  = 
(n  —  1) (/?  —  3),  and  this  is  >  (n  —  1)A  =  |Ci  |  unless  A  >  n  —  3.  When  A  >  n  —  3, 
we  obtain  \n  >  n  —  3  and  n  <  6.  Since  A  =  \n,  n  has  to  be  even  with  n  <  6. 
The  case  n  =  6  (with  A  =  3)  we  are  allowing,  and  the  case  n  —  4  with  k  —  2  has 
|C2|  =  3^6=|Ci|.  Thus  the  only  exceptions  have  A  =  1  or  n  =  6. 
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18.  In  the  composition  series  given  for  ©4  in  Section  8,  take  G  to  be  24,  N  to 
be  the  4-element  subgroup  in  the  series,  and  M  to  be  the  2-element  subgroup.  For 
another  example,  take  G  to  be  the  dihedral  group  £>4,  N  to  be  the  cyclic  subgroup  of 
the  4  rotations,  and  M  to  be  the  2-element  subgroup  of  N. 

19.  If  GCD(r,  s)  —  1,  define  a  homomorphism  cp  :  Z  (Z/rZ)  x  (Z/sZ) 
by  (pin)  —  (, n  mod  r,  n  mod  s).  This  is  0  for  n  =  rs.  Thus  it  descends  to  a 
homomorphism  Tp  :  Z/rsZ  — »■  (Z/rZ)  x  (Z/sZ).  The  kernel  of  <p  consists  of 
all  integers  n  divisible  by  r  and  s.  Since  r  and  s  are  relatively  prime,  such  integers 
are  divisible  by  rs.  Thus  ker  <p  =  rsZ,  and  Tp  is  one-one.  Since  the  domain  and  range 
have  the  same  number  of  elements,  tp  is  onto. 

Conversely  if  GCD(r,  s)  ^  1 ,  then  some  prime  p  divides  both  r  and  .v.  The  number 
of  elements  in  Crs  of  order  p  is  then  p  —  1,  while  the  number  of  elements  in  C,  x  Cs 
of  order  p  is  pip  —  1)  +  (p  —  1)  =  p2  —  1.  So  Crs  cannot  be  isomorphic  to  Cr  x  Cs. 


20.  Three,  namely  C27,  C9  x  C 3,  and  C3  x  C3  x  C3. 

/3  2  5 

21.  The  matrix  relating  the  bases  is  C  —  (  0  1  3 

Vo  1  5 


A  row  interchange  and  a 


column  interchange  move  the  entry  1  in  the  center  to  the  upper  left  and  give 


1  0  3 

2  3  5 
1  0  5 


Two  row  operations  and  one  column  operation  eliminate  the  other  entries  in  the  first 

/to  o\ 

column  and  first  row,  yielding  I  0  3  -1  1.  The  remaining  steps  pass  from  there  to 


00  2 


/1  oo\  /to  o\  /to  0\  /100\ 

(  0  -1  3  1  H*  (  0  1  -3  )  H*  (  0  1  -3  )  H*  (  0  1  0  ) 

Vo  20/  V02  0/  Voo  6/  V006/ 


Hence  H  =  Z  ®  Z  ©  6Z,  and  G/H  =  C 6- 


22.  Let  the  four  generators  for  G  be  x  1 ,  X2 ,  A' 3 ,  A4,  and  let  the  four  generators  for  H 
be  y\ ,  _V2  •  _V3 .  >’4 .  Since  each  is  linearly  independent  over  Q,  it  is  linearly  independent 


over  Z.  The  matrix  of  the  y/  ’s  in  terms  of  the  xj ’s  is  C  — 


The 


(1  0  0  o\ 

UJJJj  I .  Hence  G/H  =  C2  x  C2. 

0002/ 

23.  Each  step  of  row  reduction  or  column  reduction  preserves  the  rank  of  the  matrix 
as  a  member  of  M,m  (Q)  since  row  rank  equals  column  rank.  Following  through  the 
steps  of  the  procedure,  we  may  assume  that  the  matrix  is  diagonal  with  diagonal 
entries  D\\ , . . . ,  D,m  with  Djj  ^  0  exactly  for  1  <  j  <  r .  Then  H  —  0;/=1  //,/Z, 
and  we  can  read  off  that  H  has  rank  r  and  the  Q  rank  of  the  matrix  is  r. 


24.  Let  G  be  an  abelian  group,  and  let  G  —  Z.  For  each  g,  form  the 

homomorphism  cpg  :  Z  — >•  G  given  in  additive  notation  by  <pg{n)  —  ng.  Then  the 
universal  mapping  jrroperty  of  direct  sums  gives  the  desired  homomorphism  of  the 
free  abelian  group  G  onto  G. 
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Hints  for  Solutions  of  Problems 


25.  For  (a),  right  translation  by  any  element  of  H  D  K  sends  xH  to  itself  and  y  K 
to  itself,  hence  sends  xH  DyK  to  itself.  Therefore  xH  P\yK  is  a  union  of  left  cosets 
of  HHK.  We  are  to  see  that  at  most  one  left  coset  is  involved.  Thus  suppose  we  have 
two  elements  g i  and  g2  in  xH  Pi  yK.  Write  g i  =  xh\  —  yk\  and  g2  —  xli2  =  yk2. 
Then  gf1gi  =  h^hi  —  kfxk\,  and  gflg\  is  exhibited  as  in  H  Pi  K.  So  g  i  is  in 
g2(H  n  K). 

For  (b),  if  the  sets  x\ xmH  exhaust  G  and  the  sets  y\ K, ,  y„  K  exhaust 
G,  then  G  is  the  union  of  the  mn  sets  x,  H  Pi  yj  K .  By  (a),  G  is  exhibited  as  the  union 
of  <  mn  left  cosets  of  H  Pi  K . 

26.  Returning  to  Problem  23,  we  see  that  H  =  ®”=1  DjjZ  with  each  Djj  ^  0. 
Then  the  index  of  H  in  G  is  n/=i  Djj- 

27.  In  (a),  take  H2  =  {(1),  (1  2) (3  4),  (1  3)(2  4),  (1  4) (2  3),  (1  3),  (2  4), 
(1  2  3  4),  (1  4  3  2)}.  The  number  of  such  subgroups  is  2k  +  1  and  divides  3. 
Since  H2  is  not  normal,  the  number  is  >  1.  Therefore  it  is  3. 

In  (b),  take  Hi,  —  {(1),  (1  2  3),  (1  3  2)}.  The  number  of  such  subgroups  is 
3 k  +  1  and  divides  8.  Since  H3  is  not  normal,  the  number  is  4. 

28.  Disproof:  In  63,  take  H  =  {(1),  (1  2)}.  Then  N{H)  —  H,  and  this  is  not 
normal. 

29.  Since  168/7  =  24,  the  number  of  Sylow  7-subgroups  is  Ik  +  1  and  divides  24. 
The  group  G  is  assumed  simple,  and  so  k  ^  0.  Then  k  must  be  1,  and  there  are  8 
distinct  Sylow  7-subgroups.  Any  two  of  these  intersect  only  in  the  identity,  and  each 
contains  6  elements  of  order  7.  Hence  there  are  48  elements  of  order  7. 

30.  The  number  of  Sylow  ^-subgroups  is  qk  +  1  and  divides  p,  hence  must  be  1 . 
So  Sq  is  normal,  and  the  set  SpSq  of  products  is  a  subgroup.  An  argument  in  the  proof 
of  Proposition  4.60  shows  that  each  element  of  G  is  uniquely  a  product  of  a  member 
of  Sp  and  a  member  of  Sq ,  and  hence  G  is  a  semidirect  product. 

31.  Let  T  be  the  set  of  subgroups  conjugate  to  //,  and  form  the  action  GxF  ->  F 
by  conjugation.  The  isotropy  subgroup  at  H  is  N(H),  which  must  have  index  1  or 
index  p  in  G.  If  it  has  index  1,  then  H  is  normal,  and  F  =  1.  Otherwise  it  has  index 
p.  Then  N(H)  =  H ,  the  orbit  of  H  has  \G\/\H\  —  p  elements,  and  |F|  =  p. 

32.  In  (a),  the  subgroup  H  is  a  Sylow  2-subgroup,  and  the  number  of  its  conjugates 
must  then  be  2k  + 1  and  divide  24/8  =  3.  Since  H  is  assumed  not  normal,  the  number 
of  conjugates  has  to  be  3. 

In  (b),  call  the  conjugates  H ,  //',  and  H" .  Each  member  g  of  G  acts  on  the 
set  {H,  H' ,  H"}  by  conjugation  of  the  subgroups,  sending  H  to  gHg~l,  H'  to 
gH’ g-1,  and  H"  to  g  II” g^ 1 .  The  result  is  that  we  obtain  a  function  <t>  from  G  to  the 
permutation  group  63  on  {//,  //'.  H").  This  function  O  is  a  group  homomorphism. 

In  (c),  the  subgroup  ker  <J>  is  normal,  and  it  is  enough  to  show  that  this  subgroup  is 
neither  {1}  nor  G.  The  image  of  O  is  not  the  identity  subgroup  since  some  member 
g  of  G  has  gHg -1  =  H’\  thus  kerO  ^  G.  Since  24 / 1  ker <3> |  =  |G|/|ker<t>|  = 

|  image  <t>|  <  6,  we  have  6|  ker  <t>|  >  24  and  |  ker  <t>|  >  4;  thus  ker  <t>  ^  { 1 }. 


Chapter  IV 


633 


33.  Let  H  be  a  Sylow  3-subgroup,  of  order  9.  If  H  is  normal,  then  G/H  is  a 
subgroup  of  order  4,  necessarily  either  C 2  x  C 2  or  C 4.  Both  of  these  groups  of  order 
4  are  isomorphic  to  subgroups  of  64,  and  thus  there  is  a  nontrivial  homomorphism 
of  G  onto  a  subgroup  of  order  4  in  64. 

If  H  is  not  normal,  then  the  number  of  conjugates  of  H  is  3A:  +  1  and  divides 
4.  Then  the  number  of  conjugates  must  be  4.  Arguing  as  in  the  previous  problem 
we  obtain  a  homomorphism  of  G  into  64  by  having  each  element  of  g  map  to  the 
corresponding  permutation  of  the  conjugates  of  H.  This  homomorphism  is  nontrivial 
since  H  can  be  moved  to  any  of  its  conjugates  by  some  element  of  G  and  since  the 
number  of  such  conjugates  is  >  1 . 

34.  Let  K  be  a  Sylow  ^-subgroup.  The  number  of  conjugates  of  K  is  of  the  form 
kq  +  1  and  divides  2 p.  If  k  —  0,  then  K  is  normal.  This  conclusion  disposes  of  (a) 
and  the  first  statement  of  (b)  for  this  case.  We  come  back  to  the  remainder  of  (b)  for 
this  case  in  a  moment. 

If  A:  >  1,  then  kq  +  1  <  2p  is  impossible  since  p  <  q.  Thus  the  only  other 
possibility  besides  k  =  0  is  k  —  1.  Then  q  +  1  divides  2p.  So  q  +  1  equals  1,  2,  p,  or 
2 p.  Since  q  >  p,  the  only  possibility  is  q  +  1  =  2 p.  This  completes  the  argument 
for  (a). 

For  the  rest  we  may  assume  that  q  +  1  =  2 p.  If  either  of  H  or  K  is  normal,  then 
an  argument  in  the  proof  of  Proposition  4.60  shows  that  HK  is  a  subgroup  with  pq 
elements.  Since  2p  =  q  +  1,  p  divides  <7  +  1.  If  p  also  divides  q  —  1,  then  p  divides 
the  difference,  which  is  2,  and  we  obtain  a  contradiction.  So  p  does  not  divide  q  —  1, 
and  Proposition  4.60  shows  that  HK  is  abelian,  hence  cyclic. 

Thus  we  are  reduced  to  the  situation  that  q  +  1  =  2 p  and  K  is  not  normal;  we  are  to 
prove  that  H  is  normal.  We  have  seen  in  this  case  that  the  number  of  conjugates  of  K 
is  q  + 1 ,  and  hence  the  number  of  elements  of  order  q  is  (q  +  \  )(q  —  \  )  —  2p(q  —  I )  = 
2pq—2p.  The  number  of  conjugates  of  //  is  of  the  form  lp+\  and  divides  2q .  If/  =  0, 
then  H  is  normal,  and  we  are  done.  If  /  >  1,  then  the  number  of  elements  of  order  p  is 
(lp+  l)(p—  1)  >  (p+  l)(p  —  1)  =  p2  —  1.  Thus  the  total  number  of  elements  of  order 
l,/>,orz/is>  l  +  (p2  —  l)  +  (2pq—2p)  =  2pq  +  (p—  l)2  —  1  >  2pq+22  —  1  >  2 pq, 
and  we  have  obtained  a  contradiction. 

35.  Certainly  i/r  is  one-one  and  onto.  For  (/z,  k)  and  (h\  k')  in  H  K ,  we  have 
ir((h,k)(h\k'))  =  f{hh',  i(<p2)h'-'(k))k')  =  (cp(hh'),  ( (<p2)h'-i(k))k ') 


and 


fih,  k)f(h k!)  =  k){ip{h'),  k')  =  {cpihti),  iin)vih'r^k))k’). 

The  right  sides  are  equal  because  {(p2)h'-1  —  O/h  ■ 

36.  Again  f  is  visibly  one-one  and  onto.  The  formula  for  (pi  in  terms  of  <p\  is 
given  more  concretely  as  ((p2)h(k)  =  a(i<pi)hia~l  (k))).  For  (h,k)  and  (hr ,  k')  in 
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H  x,Pl  K,  we  then  have 

k)(h',  k’))  =  fs(hh',  ((<oi)*,-.  (k))k') 

=  (/t^a((Opi)ft'-i  (&))&'))  =  (hh' ,  a((<pi)hf-i  (k))a(k')) 


and 


f(h,k)f{ti,k')  =  ( h,a(k))(h\a(k '))  =  (hti,  {{tp2)h,-i{a{k)))a(k')) 

The  right  sides  are  equal  because  a-1  (a(k))  —  k. 

37.  An  action  of  Cp  on  Cq  is  a  homomorphism  of  Cp  into  Aut  Cq  =  Cq-\.  If  a 
is  a  generator  of  Cp  and  b  is  a  generator  of  Cq- 1,  we  may  assume  that  a  \-+  bk  for 
some  k.  Since  the  action  is  nontrivial,  0  <  k  <  q  —  1.  Then  1  =  ap  maps  to  bkp, 
and  therefore  bkp  must  be  1 .  This  means  that  kp  must  be  a  multiple  of  q  —  1 .  So 
kp  —  r(q  —  I ).  Since  0  <  k  <  q  —  1,  we  see  that  p  >  r.  Therefore  p  does  not 
divide  r  and  must  divide  q  —  1 . 

38.  Put  n  —  (q  —  I )/ p.  Let  a  be  a  generator  of  Cp,  and  let  b  be  a  generator 
of  AutC9  =  Cq- 1.  For  reference,  take  r(a)  =  b" .  This  defines  a  nontrivial 
homomorphism  of  Cp  into  Cq-\.  Any  other  one  is  of  the  form  r\  (a)  —  //  with 
0  <  k\  <  q  —  1.  As  in  the  previous  problem,  we  know  that  k\  p  —  r(q  —  I ).  Hence 

=  nr  for  some  r  with  1  <  r  <  p  —  1.  The  mapping  tp(as)  —  ars  is  then  an 
automorphism  of  Cp,  and  rj  (a)  —  bkl  —  bnr  —  x(ar)  —  (r  o  cp)(a).  So  rj  —  x  o  cp. 
Problem  35  applies  and  yields  the  desired  isomorphism. 

40.  For  (a),  D4  2  Q  3  C2  2  {1|.  where  C4  is  the  subgroup  of  rotations.  For  (b), 
^8  2  Q  2  C2  2  {1},  where  C4  is  the  subgroup  {±1,  ±i}. 

41.  For  (a),  the  trivial  subgroup,  the  whole  group,  and  all  subgroups  of  index  2 
are  automatically  normal.  The  only  other  possibility  is  order  2.  Since  —  1  is  the  only 
element  of  order  2,  the  only  subgroup  of  order  2  is  {±1}.  This  is  the  center  of  IIh  and 
hence  is  normal. 

For  (b),  the  five  conjugacy  classes  are  {±i},  {±j},  {±k},  {—1},  and  {1}. 

For  (c),  Problem  15  shows  that  the  inner  automorphisms  form  a  normal  subgroup 
isomorphic  to  the  quotient  of  by  its  center.  The  center  is  {±1},  and  thus  the  inner 
automorphisms  form  a  subgroup  of  the  group  of  all  automorphisms  isomorphic  to 
C 2  x  C 2.  The  nontrivial  inner  automorphisms  multiply  two  of  i,  j,  k  by  —1  and  fix 
the  third  one.  In  addition,  the  cyclic  map  ih-^jh-^ki-^iisan  automorphism  and 
gives  an  automorphism  of  order  3.  So  is  its  square.  One  more  automorphism  fixes 
i  and  has  j  1 — ^  k  1 — ^  — j  1 — ^  — k.  Consequently  the  group  of  automorphisms  G  acts 
transitively  on  the  set  of  six  elements  of  order  4,  and  |G|  =  6|//|,  where  H  is  the 
subgroup  fixing  i.  With  i  fixed,  an  automorphism  can  carry  j  to  any  of  ±j  and  ±k. 
Thus  \  H\  —  A\K\,  where  K  is  the  subgroup  fixing  i  and  j.  Since  i  and  j  generate  //x , 
K  is  trivial.  Hence  |  Aut  //x  |  =  24. 
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42.  The  only  possible  orders  are  the  divisors  of  8.  If  it  were  to  have  an  element  of 
order  8,  it  would  be  cyclic,  hence  abelian.  If  all  elements  other  than  the  identity  were 
to  have  order  2,  it  would  be  abelian  by  Problem  1 .  Hence  it  must  have  an  element  of 
order  4. 

43.  Let  C 2  be  the  subgroup  generated  by  the  element  of  order  2.  Proposition  4.44 
shows  that  G  is  a  semidirect  product  C2  xr  K,  and  r  has  to  be  nontrivial  for  G  to 
be  nonabelian.  By  Problem  16a,  there  is  only  one  possibility  for  r.  Since  D4  is  one 
such  semidirect  product,  G  must  be  isomorphic  to  D4. 

44.  Let  the  elements  of  K  be  the  powers  of  i.  By  assumption  every  element 
outside  K  has  order  4.  Thus  i2  is  the  only  element  of  order  2.  Its  conjugacy  class 
therefore  contains  no  other  element,  and  it  is  central.  Let  us  write  —  1  for  this  element. 
No  element  other  than  ±1  can  be  central  since  if  the  center  has  order  4,  then  it 
commutes  with  any  other  element  and  together  they  generate  an  abelian  G.  So 
Zg  =  {±1}.  Next  let  j  be  an  element  of  order  4  not  in  K.  Define  k  =  ij.  We  know 
that  j2  =  k2  =  —  1,  and  thus  the  8  elements  are  ±1,  ±i,  ±j,  ±k.  From  k  =  ij,  we 
obtain  kj  =  (i)  ( —  1)  =  —  i  and  similarly  ik  =  —  j.  Finally  we  know  that  i  and  j  do  not 
commute  (since  G  would  otherwise  be  abelian)  and  that  neither  ij  nor  ji  is  a  power  of 
iorj.  Thus  ji  has  to  be  ±k  and  cannot  be  k.  So  ji  =  — k,  and  we  then  obtain  jk  =  i 
and  ki  =  j.  Thus  the  multiplication  table  in  G  matches  that  in  Hg,  and  we  have  an 
isomorphism. 

46.  Suppose  K  =  C 4.  If  H  acts  nontrivially  on  K,  then  there  is  a  nontrivial 
homomorphism  of  H  =  C3  into  Aut  K  =  Aut  C4  =  C2.  Since  C2  has  no  element  of 
order  3,  this  is  impossible. 

If  K  =  C2  x  C 2,  then  Aut  K  =  63,  the  automorphisms  being  the  permutations 
of  the  set  {(1, 0),  (0,  1),  (1,  1)}-  Thus  there  are  two  nontrivial  homomorphisms  of  C3 
into  Aut  K.  Since  the  elements  of  order  3  in  G3  are  conjugate  in  63,  Problem  36 
applies  and  shows  that  the  two  resulting  semidirect  products  are  isomorphic.  The 
group  24  meets  the  conditions  of  this  problem,  and  hence  the  given  G  must  be 
isomorphic  to  24. 

47.  Certainly  one  of  those  conditions  holds,  and  G  is  abelian  if  (i)  holds.  If  (ii) 
holds,  then  r  has  order  2,  and  r  is  determined  by  its  kernel.  Let  us  rewrite  the  group 
K  as  C2  x  C2  with  the  second  factor  as  the  kernel  of  r,  so  that  r  factors  through  to 
a  homomorphism  of  the  first  factor.  Then  (C2  x  C2)  x  r  C3  =  C2  x 7  (C2  x  C3)  = 
C2  Xf  C(,  =  £>6-  If  (iii)  holds,  we  have  a  nonnormal  subgroup  of  order  4  in  G,  and 
this  does  not  happen  in  24  or  D4. 

48.  If  (iii)  holds,  the  homomorphism  C4  — >  Aut  C3  has  to  be  nontrivial  and  is  then 
uniquely  determined  since  AutCs  =  C 2.  This  proves  the  uniqueness  of  the  group 
up  to  isomorphism.  The  group  has  1  element  of  order  1 ,  3  elements  of  order  2,  2 
elements  of  order  3,  and  6  elements  of  order  4. 

49.  Let  H  be  a  Sylow  q -subgroup,  and  let  K  he  a  Sylow  /7-subgroup.  The  number 
of  conjugates  of  H  is  of  the  form  qk  +  1  and  divides  p 2.  Since  p  is  prime,  qk  +  1 
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must  be  1 ,  p,  or  p2.  If  H  is  not  normal,  then  k  >  0  and  we  cannot  have  qk  +  1  =  p 
since  p  <  q\  therefore  qk  +  I  =  p2.  In  this  case  the  number  of  elements  of  order  q 
is  (qk  +  l)(q  —  1)  =  p2(q  —  1)  =  p2q  —  p2,  and  a  Sylow  /^-subgroup  then  accounts 
for  all  the  remaining  elements.  Consequently  H  not  normal  implies  K  normal. 

Now  let  us  analyze  what  k  must  be  when  qk  —  p2  —  1 .  Since  q  is  prime,  q  divides 
p  +  1  or  q  divides  p  —  1.  But  the  condition  q  divides  p  —  1  is  impossible  since  p  <  q, 
and  thus  q  divides  p+  1.  Since  2q  >  p  +  q  >  p  +  1 ,  we  must  in  fact  have  q  =  p+1. 
Since  all  primes  but  2  are  odd,  this  says  that  p  =  2  and  q  =  3.  We  conclude  that 
either  p2q  —  1 2  or  else  the  condition  qk  —  p2  —  1  is  impossible;  when  qk  —  p2  —  1 
fails,  we  have  seen  that  H  is  normal. 

50.  We  form  three  distinct  semidirect  products,  two  with  Sylow  p- subgroup  C pi 
and  one  with  Sylow  p- subgroup  Cp  x  Cp.  For  each  a  Sylow  g -subgroup  Cq  is 
to  be  normal.  We  know  from  Problem  16a  and  Corollary  4.27  that  the  group  of 
automorphisms  of  the  cyclic  group  Cq  is  isomorphic  with  Cq-\.  We  obtain  one 
homomorphism  Cpi  — »■  Cq-\  by  mapping  a  generator  of  Cpi  to  an  element  in  Cq-i 
of  order  p2  and  a  second  homomorphism  by  mapping  a  generator  of  Cpi  to  an  element 
in  Cq- 1  of  order  p.  The  third  semidirect  product  comes  by  having  the  first  factor  Cp 
of  Cp  x  Cp  act  trivially  on  Cq  and  having  the  second  factor  act  with  a  generator  of 
Cp  mapping  to  an  element  of  order  p  in  Cq-\ . 

5 1 .  The  second  and  third  groups  constructed  in  the  previous  problem  make  sense 
when  p  divides  q  —  1. 

52.  If  p  does  not  divide  q  —  1 ,  then  p2q  ^  12.  Problem  49  then  shows  that  a  Sylow 
^-subgroup  is  normal.  Hence  the  group  has  to  be  a  semidirect  product.  The  action 
of  a  Sylow  />- subgroup  on  Cq  corresponds  to  a  homomorphism  of  Cpi  or  Cp  x  Cp 
into  Cq- 1 ,  and  the  condition  that  p  not  divide  q  —  1  means  that  Cpi  or  Cp  x  Cp  must 
map  to  the  identity.  Therefore  the  group  is  abelian. 

53.  In  (a)  and  (b),  the  automorphism  group  of  Z/9Z  is  given  by  multiplication  by 
the  members  of  (Z/9Z)X  =  {1,  2,  4,  5,  7,  8}.  The  element  4  has  square  7  and  cube  1 
modulo  9,  and  hence  the  multiplications  by  1,4,7  yield  a  group  of  automorphisms  of 
order  3  of  Cg.  Hence  C 3  has  a  nontrivial  action  by  automorphisms  on  C 9,  and  there 
exists  a  nonabelian  semidirect  product  of  C 3  and  C9  with  C9  normal. 

In  (c),  let  a  be  a  generator  of  C 9,  let  b  be  a  generator  of  C 3,  and  let  t/,  be  the 
automorphism  a  a1 .  Then  r^-i  is  the  automorphism  an  a4”,  and  Th~P(an)  = 
a4'".  Proposition  4.43  says  that  (bm ,  an)(bp,  aq)  —  (bm+p ,  (r ],-P(an))aq),  and  the 
right  side  equals  ( b'n+p ,  a4'  n+q).  Taking  m  —  —  1,  n  —  1,  p  —  1,  and  q  —  0,  we 
obtain  (b~l ,  a)(b ,  1)  =  (1,  a4).  Abbreviating  (1,  a)  as  a  and  (b,  1)  as  b,  we  obtain 
a9  —  b3  =  b~laba~4  —  1. 

54.  In  such  a  group  the  subgroup  H  is  normal  by  Proposition  4.36,  and  thus  the 
group  of  order  27  is  a  semidirect  product  of  C 3  and  C 9  with  C 9  normal.  A  nonabelian 
such  semidirect  product  must  have  a  generator  of  C 3  mapping  into  an  automorphism 
of  order  3  of  C9.  There  are  two  possibilities,  and  Problem  35  shows  that  they  lead  to 
isomorphic  semidirect  products. 
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55.  |GL(2,  F)|  =  ( q 2  -  1  )(q2  -  q)  and  |SL(2,F)|  =  (q  -  l)”1  |GL(2,  F)| 

because  |GL(2,  F)|  =  |  kerdet  |  |  image det  |.  This  handles  (a)  and  (b).  For  (c),  the 
scalar  matrices  of  determinant  1  are  those  for  which  the  scalar  has  square  1 .  Since  the 
characteristic  is  not  2,  both  ±  1  qualify.  Since  F  is  a  field,  the  polynomial  X2  —  1  can 
have  only  two  roots.  So  we  factor  by  a  group  of  order  2,  and  the  number  of  elements 
is  cut  in  half.  For  (d),  the  order  in  general  is  —  —  h(q  —  1  )q{q  +  1).  Then 

|PSL(2,F7)|  =  168. 

56.  Regard  G  as  a  group  of  invertible  linear  mappings  that  is  to  be  written  in  the 
standard  basis  £.  Let  T  =  (m,  w).  If  A  =  (  pp  then  A  =  ^  the  upper  right 

entry  being  —  1  because  det  M  —  1 .  Then  ^  (  sr  )  '  Products 

AB  go  into  products  of  such  expressions,  and  conjugates  hAh~x  by  matrices  of 
determinant  1  go  into  expressions 

^  (  et)  (  Sr)  '  H  (  sr)  A  (  et)"  M  (  et)  ^  (  Er)"*  )~* 

that  are  conjugates  of  such  expressions.  Thus  if  A  and  such  expressions  generate 
SL(2,  F),  then  the  conjugates  generate  the  conjugates,  again  giving  SL(2,  F). 

57.  In  (a),  B~l  A~l  B  A  is  the  product  of  the  conjugate  B~x  A~x  B  of  the  inverse  of 
A  by  A  itself  and  hence  is  in  G.  Direct  computation  shows  that  the  matrix  in  question 
is  ^  c{'a  2_1) )  •  ln  (b),  the  diagonal  entries  are  equal  if  and  only  if  a~ 2  =  a2,  hence 

if  and  only  if  a4  —  1.  In  (c),  the  result  of  (b)  shows  that  there  are  at  most  4  choices 
of  a  to  avoid.  We  must  also  avoid  a  —  0.  Thus  if  the  field  has  more  than  5  elements, 
a  can  be  chosen  nonzero  so  that  a4  /  1. 

58.  As  in  Problem  57a,  the  conditions  that  C  is  in  G  and  det  D  —  1  imply  that 
C  DC~X  D~l  is  in  G.  The  product  in  question  is  ^  4  x"“1  j.  Since  x  ^  ±1,  A  =  x2  —  1 
is  not  0. 

59.  Let  A  be  the  set  of  X  such  that  E(X )  =  (o  i  )  is  in  G.  Since  E(X  +  X')  = 
E(X)E(X’)  and  E( A.)-1  =  £(— A.),  A  is  closed  under  addition  and  negation.  Since 

(o  ff0’  )  (o  (A  )  =  E(a2X ),  A  is  closed  under  multiplication  by  squares  of 

nonzero  elements. 

60.  The  previous  problems  produce  some  Xq  ^  0  in  A,  and  —  Ao  is  in  A  since  A  is 
closed  under  negatives.  If  x  ^  ±  1 ,  then  ^  (x  +  l)2  and  \  {x  —  1)  are  nonzero  squares, 
and  hence  ^  (x  +  1  )2 Ao  and  ^  (x  —  l)Ao  are  in  A.  Subtracting,  we  see  that  xAo  is  in  A. 
Thus  all  multiples  of  Ao  except  possibly  for  those  by  0,  +1,  —1  are  in  A.  However, 
we  have  seen  separately  that  0,  Ao,  — Ao  are  in  A.  Hence  A  =  F. 

61.  The  conjugacy  follows  from  (Ao^oi^-io)  =  (-A)'  ^ext  we 
have  (oi)(*i)(oi)  =  (  'A*  ^  A+ A  )  ’  an(^  4t  f°H°ws  that  every  member  of 
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SL(2,  F)  with  lower  left  entry  nonzero  is  in  G.  Conjugating  by  we  obtain 

the  same  conclusion  when  the  upper  right  entry  is  nonzero.  Finally  (  g  [  )  (  g  ff_i  ^  = 

(; ^  )  =  (;?!)(; )  »<>  si>»»s  “  ^  . ) *  "> 

G.  Hence  G  =  SL(2,  F). 

62.  Let  (p  :  SL(2,  F)  — >■  PSL(2,  F)  be  the  quotient  homomorphism.  If  H  is  a 

normal  subgroup  /  {1}  in  PSL(2,  F),  then  is  a  normal  subgroup  of  SL(2,  F) 

containing  an  element  not  in  the  center.  By  Problem  61,  tp~l(H)  —  SL(2,  F). 
Therefore  H  =  (p(q>~l(H))  =  <p(SL(2,  F))  =  PSL(2,  F). 

63.  If  a  differs  from  c  in  a  set  A  of  k  places  and  if  b  differs  from  c  in  a  set  B  of 
I  places,  then  a  differs  from  b  at  most  in  the  places  of  ,4  U  B.  hence  in  at  most  k  +  l 
places.  Therefore  d(a,  b)  <  d(a,  c)  +  d(c ,  b ). 

If  d(w ,  a)  <  (D  —  l)/2  and  d(w,  b)  <  (D  —  l)/2  with  a  and  b  distinct  in  C, 
then  it  follows  that  d(a,  b)  <  (D  —  1)  and  hence  that  5(C)  =  min^y  me  d(x,  y)  < 
d(a,  b)  <  (D  —  1)  <  D. 

64.  Since  C  is  linear,  0  is  in  C.  Then5(C)  <  d((),  c)  foi' every  c  in  C,  and  we  obtain 
5(C)  <  min,  ef  d( 0,  c).  On  the  other  hand,  we  certainly  have  d(a,  b)  =  d(().  a  —  b) 
for  all  a,  b  in  F".  If  a  and  b  are  in  C,  then  the  linearity  of  C  forces  a  —  h  to  be  in  C, 
and  hence  d(a,  h)  =  d{(),  a  —  b)  >  minf€c  d{ 0,  c).  Taking  the  minimum  over  all  a 
and  b,  we  obtain  5(C)  >  mincsc  d( 0,  c).  Hence  equality  holds. 

65.  n  +  1  and  0,  1  and  n,  n  and  1, 2  and  n  —  1. 

66.  In  (a),  a  basis  vector  c  is  1  in  one  of  the  entries  corresponding  to  the  corner 
variables,  and  it  is  0  in  the  other  entries  corresponding  to  corner  variables.  At  worst 
it  could  be  1  in  every  entry  corresponding  to  an  independent  variable.  The  number 
of  independent  variables  is  n  minus  the  rank,  i.e.,  n  minus  dimC.  Thus  wt(c)  < 
1  +  n  —  dimC.  Since  5(C)  <  wt(c),  dimC  +  5(C)  <  n  +  1. 

For  (b),  one  can  take  the  parity-check  code. 

For  (c),  the  alternative  would  be  dim  C +5  (C)  =  /;  + 1 .  Then  dim  C+wt(c)  >  n+ 1 
for  every  c  in  C.  Consequently  every  basis  vector  of  C  must  have  a  1  in  every  position 
corresponding  to  an  independent  variable.  Since  dimC  >  2,  there  are  at  least  two  such 
basis  vectors.  Their  sum  gets  a  contribution  of  2  to  its  weight  from  the  corner  variables 
and  can  have  a  0  in  at  most  1  position  corresponding  to  an  independent  variable.  But 
their  sum  is  0  in  every  position  corresponding  to  an  independent  variable.  Hence  there 
is  at  most  one  such  position,  and  we  conclude  that  n  —  dim  C  =  1 ,  in  contradiction 
to  the  hypothesis  dim  C  <  n  —  2. 

67.  A  direct  check  of  all  seven  nonzero  elements  of  C  shows  that  each  has  weight 
3.  Therefore  5(C)  =  3. 

68.  In  (a),  the  basis  vectors  each  have  one  1  in  positions  3,  5,  6,  7,  and  at  least 
two  of  the  parity  bits  in  positions  1,  2,  4  are  1  since  none  of  3,  5,  6,  7  is  a  power  of 
2.  Any  sum  of  two  distinct  basis  vectors  has  two  1  ’s  in  positions  3,  5,  6,  7,  and  the 
parity  bits  cannot  all  be  0  since  the  parity  bits  for  each  of  the  basis  vectors  identify  the 
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basis  vector  and  since  the  two  basis  vectors  in  question  are  distinct.  Finally  the  sum 
of  three  or  more  basis  vectors  has  1  in  three  or  more  positions  3,  5,  6,  7  and  hence  has 
weight  >  3.  Thus  all  code  words  have  weight  >  3,  and  therefore  8{Ci)  >  3.  Since 
the  first  basis  vector  has  weight  3,  8(C-/)  —  3. 

In  (b),  each  word  in  C g  is  a  word  of  Cj  plus  a  parity  bit.  The  part  from  C 7  has 
weight  >  3,  by  (a),  and  the  parity  bit  means  that  the  weight  has  to  be  even.  Thus  the 
weight  of  every  word  in  C g  is  >  4. 

In  (c)  for  C 2r-u  we  distinguish  between  the  r  bits  whose  indices  are  a  power  of 
two  and  the  other  2r  —  1  —  r  bits.  The  first  are  the  check  bits,  and  the  others  are  the 
message  bits.  The  message  bits  are  allowed  to  be  arbitrary,  and  the  check  bits  will 
depend  on  them.  Thus  dim  Cg  =  2r  —  r  —  1.  For  a  given  pattern  of  message  bits, 
the  check  bit  in  position  2-'  counts,  modulo  2,  the  number  of  l’s  in  message  bits  that 
occur  is  positions  requiring  2J  in  their  binary  expansions.  Then  Cv  is  obtained  by 
adjoining  a  parity  bit  to  each  word  of  C 2r-i- 

The  first  conclusion  of  (d)  was  proved  in  the  course  of  answering  (c),  and  the  other 
two  conclusions  follow  by  the  same  argument  that  was  given  for  r  =  3  in  (a)  and  (b). 

69.  In  (a),  the  dimension  of  the  null  space  of  H  is  the  number  of  columns  minus 
the  rank,  hence  is  7  —  3  =  4.  Since  C7  lies  in  the  null  space  and  dim  C 7  =  4,  the  null 
space  equals  C7. 

In  (b),  let  c  be  in  C 7.  If  c,  denotes  the  usual  ;th  basis  vector,  then  II (c  +  e,)  = 
He  +  Het  —  II  cu ,  and  this  is  the  Ith  column  of  H. 

70.  Take  a  basis  of  C,  write  it  as  the  rows  of  a  matrix,  row  reduce  the  matrix, 
and  permute  the  variables  so  that  all  the  corner  variables  precede  all  the  independent 
variables.  The  resulting  matrix  in  block  form  is  (/  A)  for  some  matrix  A  with  dim  C 
rows  and  n  —  dimC  columns.  Since  each  basis  vector  has  weight  >  3,  each  row  of 
A  has  at  least  two  l’s.  Since  each  sum  of  two  distinct  basis  vectors  has  weight  >  3, 
the  sum  of  two  distinct  rows  of  A  cannot  be  0.  Thus  the  rows  of  A  must  be  distinct. 

Arguing  by  contradiction,  suppose  that  dim  C  >  n  —  r ,  so  that  A  has  <  r  —  1 
columns.  The  number  of  possible  rows  in  A  with  at  least  two  l’sisthen<  2r~l  —  1  — 
(r  —  1)  =  2r~l  —  r.  Hence  n  —  r  <  dimC  <  2'  “'  —  r,  and  n  <  2r~l ,  contradiction. 

71.  For  (a),  the  answers  are  X'\  ( X  +  Y)n,  Xn  +  Yn,  ^((X  +  Y)n  +  —  Y)n), 

X6  +  1 X3Y3,  X1  +  7X4T3  +  7X3F4  +  Y\  and  X8  +  14X4T4  +  T8.  The  last  three 
are  by  a  direct  count  of  the  number  of  code  words  of  each  weight. 

In  (b),  the  0  word  is  the  unique  code  word  of  weight  0,  and  it  is  present  in  every 
linear  code. 

In  (c),  the  expression  X”~wl(c')  Ywl(c>  makes  a  contribution  of  0  to  the  coefficient 
Nk(C )  of  Xn~kYk  if  wt(c)  7^  k  and  makes  a  contribution  of  1  to  the  coefficient  if 
wt(c)  =  k.  Summing  on  c  yields  Nk(C)Xn~kYk  =  £csC  x"-wt(<:)rwt(c>. 

72.  Theequality  (1  +  X+X2  +  X4)(1+X+X3)  =  1  +  X7  produces  a  member  of  C 
withweight2.  Therefore  <5  (C)  <  2.  On  the  other  hand,  the  product  of  1  +  X+X2+X4 
with  a  polynomial  can  never  be  a  monomial,  and  therefore  no  code  word  has  weight 
1.  Thus  8(C)  >  1. 
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73.  In  essence  we  use  the  method  suggested  by  the  solution  to  Problem  70,  except 
that  we  put  coefficients  corresponding  to  low  degrees  on  the  left  and  we  row  reduce 
the  matrix  into  the  form  ( A  I).  Let  8  <  n  <  19.  Form  the  images  of  as  many  of  the 
following  polynomials  as  have  degree  <  n : 

1,  X,  A2,  X 3,  X4,  X5,  X6  +  1,  X7  +  X+  1,  Xk(X8  +  A2  +  A  +  1)  for k  >  0. 

The  list  stops  with  k  —  n  —  16.  Assemble  the  coefficients  of  the  image  polynomials 
as  the  rows  of  a  matrix  as  in  Problem  70.  The  images  form  a  basis  of  C.  They  all 
have  weight  4,  and  thus  every  member  of  C  has  even  weight.  Since  the  image  of  1 
has  weight  4,  8(C)  must  be  2  or  4. 

Imagine  doing  a  row  reduction  as  in  the  solution  of  Problem  70.  We  want  to  rule 
out  8  ( C )  =  2,  and  it  is  enough  to  show  that  the  basis  vectors  and  all  sums  of  two 
distinct  basis  vectors  have  weight  >  2.  To  handle  the  basis  vectors,  it  is  enough  to 
show  that  the  A  part  of  the  reduced  matrix  (A  I)  never  has  just  one  1  in  a  row.  To 
handle  the  sums  of  two  distinct  basis  vectors,  it  is  enough  to  show  that  the  sum  of 
two  rows  of  A  is  never  0,  i.e.,  that  the  rows  of  A  are  distinct. 

The  matrix  A  will  have  8  columns,  corresponding  to  powers  X1  with  /  <  7.  The 
rows  of  (A  I)  are  thus  to  correspond  to  polynomials  of  the  form  X'"  +  “lower,”  where 
each  expression  “lower”  has  degree  at  most  7  and  m  takes  on  the  values  8,  9 
The  polynomials  whose  images  correspond  to  the  rows  of  the  reduced  matrix  are 

1,  A,...,  X5,  A6  +  1,  X7  +  X+  1, 

A8  +  A2  +  A  +  1,  A  (A8  +  A2  +  A  +  1),  . . . ,  A3  (A8  +  A2  +  A  +  1), 

and  the  left  part  A  of  the  reduced  matrix  is 

/  1  1  1  o  o  o  o  o\ 

01110000 
00111000 
00011100 
0  0  0  0  1  1  10 
,  _  0  0  0  0  0  111 
—  11100011 
10010001 
10101000 
01010100 
00101010 
Vo  0  0  1  0  1  0  1  / 

No  row  of  A  is  0,  and  no  two  distinct  rows  are  equal.  This  completes  the  proof. 

74.  Suppose  that  {As}.sss  is  an  object  in  Cs  and  that  fs  :  Xs  — >•  A  for  each  s  is 
a  function,  A  being  a  particular  set.  The  disjoint  union  of  the  Xs ’s  consists  of  all 
ordered  pairs  (xs,  s)  with  s  e  S  and  xs  e  Xs,  and  we  define  is(xs)  —  (xs ,  s).  To 
define  a  function  /  from  the  disjoint  union  of  the  Xs ’s  into  A  such  that  fis  —  fs  for 
all  s,  we  let  f(xs,  s )  =  fs(x„).  Then  fis (xf)  =  f(xs,  s)  =  fs(xs).  Thus  /  exists. 
On  the  other  hand,  the  condition  that  fis  —  fs  forces  f(xs,s)  to  be  fs  (xs ) ,  and  hence 
the  /  in  the  universal  mapping  property  is  unique,  as  it  is  required  to  be. 
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76.  Peeking  ahead  to  Problem  80,  we  take  the  category  to  be  Copp,  where  C  is  the 
category  defined  in  Section  1 1  after  Example  4  of  products.  The  category  C  has  no 
product  functor  when  S  has  two  elements. 

77.  The  existence  of  the  identity  and  associativity  are  part  of  the  definition.  The 
existence  of  inverses  is  given  in  the  hypothesis.  The  answer  to  the  question  is  “yes”; 
if  a  group  G  is  given,  define  a  category  with  one  object,  namely  the  set  G,  define 
Morph(G,  G)  to  be  the  set  G,  and  let  the  law  of  composition  be  the  group  law. 

78.  To  see  that  oopp  is  well  defined,  let  /  be  in  MorphCoPP(A,  B ),  and  let  g  be 
in  MorphCoPP(5,  C).  The  definition  is  g  oopp  /  =  /  o  g,  and  this  is  meaningful 
since  g  is  in  Morphc(C,  B)  and  /  is  in  Morphc(fi,  A).  The  associativity  and  the 
existence  of  the  identity  are  straightforward  to  check.  It  is  clear  from  the  definition 
that  (Copp)opp  =  C. 

In  a  diagram  the  vertices  stay  where  they  are,  and  so  do  the  morphisms,  since  the 
objects  and  the  sets  of  morphisms  do  not  change.  However,  the  direction  of  each 
arrow  is  reversed  since  “domain”  and  “range”  are  interchanged  in  passing  from  C  to 
Copp.  Thus  diagrams  map  to  diagrams  with  the  arrows  reversed. 

Compositions  correspond  because  of  the  definition  of  oopp,  and  it  follows  that 
commutative  diagrams  map  to  commutative  diagrams. 

79.  Let  A  and  B  be  sets  such  that  A  has  three  elements  and  B  has  one  element. 
The  number  of  functions  from  A  to  B  is  then  one,  and  the  number  of  functions  from  B 
to  A  is  three.  Since  Morphy „PP( A.  B)  =  Morphy B,  A),  MorphCoPP(A,  B)  has  three 
elements  and  cannot  be  accounted  for  by  functions  from  A  to  B. 

80.  For  (a),  if  ( X ,  {p.(}.sss)  is  a  product  of  {Xs}s€s,  we  set  up  the  diagram  of  the 
universal  mapping  property  of  the  product.  Passing  to  Copp  and  using  Problem  78, 
we  obtain  the  same  diagram  in  C  opp  but  with  the  arrows  reversed.  Then  it  follows  that 
(X,  {ps}ses),  when  interpreted  in  Copp,  satisfies  the  condition  of  being  a  coproduct. 
The  other  half  proceeds  in  the  same  way. 

For  (b),  we  start  with  two  coproducts  in  C  and  pass  to  Copp,  where  they  be¬ 
come  products,  according  to  (a).  Proposition  4.63  shows  that  the  two  products  are 
canonically  isomorphic  in  C  opp.  This  isomorphism,  when  reinterpreted  in  C,  is  still  an 
isomorphism,  and  the  result  is  that  the  two  coproducts  in  C  are  canonically  isomorphic. 


Chapter  V 

1.  For(a),wehave((gi,/n)((g2,/i2)x))  =  (gi, /!i)(g2x/rj"1)  =  gxgixhj  lh\l  = 
(gig2)x(h\h2)~l  =  (gig2,  hxh2)x  and  (1,  \)x  =  lxl~l  =  x. 

For  (b),  left  multiplications  by  GL (in ,  C)  preserve  the  row  space,  hence  the  rank, 
and  right  multiplications  by  GL(n,  C)  preserve  the  column  space,  hence  the  rank. 
Hence  all  members  of  an  orbit  have  the  same  rank. 

Row  operations,  which  correspond  to  left  multiplications  by  elementary  matrices, 
can  be  used  to  bring  the  matrix  into  reduced  row-echelon  form,  and  then  column 
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operations,  which  correspond  to  right  multiplications  by  elementary  matrices,  can  be 
used  to  bring  the  result  into  reduced  column-echelon  form.  If  r  =  min(m,  n),  then 
the  resulting  matrix  is  I  in  entries  (1,  1),  (2,  2),  . . .  ,  (/,  l)  for  some  /  <  r  and  0 
elsewhere.  This  has  rank  /  and  answers  (c)  and  the  remainder  of  (b). 

2.  If  A  has  minimal  polynomial  Xk  +  Ck- \Xk~x  +  •  ■  ■  +  c\X  +  co,  with  co  ^  0, 
then  I  =  A(— c^JA^-1  +  Ck-\  Ak~2  +  •••-(-  and  A  is  invertible.  Conversely 
if  co  =  0,  then  X  is  a  factor  of  the  minimal  polynomial  and  must  be  a  factor  of  the 
characteristic  polynomial,  by  Corollary  5.10.  Then  0  is  an  eigenvalue,  and  the  null 
space  is  nonzero.  Hence  A  is  not  invertible. 

3.  Proposition  5.12  shows  that  Ij  >  ma x(rj,Sj).  For  u  in  U,  we  know  that 
P\(L)r 1  ■  ■  ■  Pk(L)rt(u)  =  0.  For  w  in  W,  we  know  that  Pi(L)s'  ■  ■  ■  P]i{L)Sk{w )  = 
0.  Thus  any  v  in  U  or  W  has  Pi (X)max(n.si.> . . .  pk(L)max(rt’St\v)  =  0.  Forming 
sums,  we  see  that  p1(L)max(>i';fi) . . .  Pk(  Lyrax<rk-’Sk>  (v)  —  0  for  all  v  in  V.  Thus  the 
minimal  polynomial  divides  Pi  (X)max('i.si)  . . .  pk(X)max(n'st\  and  we  must  have 
lj  <  ma x(rj,  Sj ). 

4.  For  any  monomial  P(X)  —  XJ ,  the  monomial  Q(X)  =  X P(X)  —  XJ+l 
has  Q(BA)  —  BA(BA)'  —  B(AB)J  A  —  BP(AB)A.  Taking  suitable  linear 
combinations  of  this  result  as  j  varies,  we  obtain  (a). 

For  (b),  let  Mab(X)  and  Mba(X)  be  the  minimal  polynomials  of  AB  and  BA. 
Part  (a)  implies  that  Mba(X)  divides  XMab(X).  Reversing  the  roles  of  A  and  B,  we 
see  that  Mab(X)  divides  X M ba( X ).  By  unique  factorization  all  the  prime  powers  in 
the  prime  factorizations  of  Mab(X)  and  Mba(X)  are  the  same  except  for  the  power 
of  X.  The  powers  of  X  in  the  factorizations  of  Mab(X)  and  Mba(X)  differ  at  most 
by  1. 

5.  Theorem  5.14  allows  us  to  write  K”  =  U\  ®  •  •  •  ©  Uk  and  K"  =  Wi  ©  •  •  •  ®  W/, 
where  the  Uj  are  the  eigenspaces  for  the  distinct  eigenvalues  of  D  and  the  Wj  are 
the  eigenspaces  for  the  distinct  eigenvalues  of  D’ .  These  decompositions  are  the 
primary  decompositions  as  in  Theorem  5.19,  and  (e)  of  that  theorem  shows  that 
Wj  —  (Wj  IT  t/i )  ®  •  •  •  ®  (Wj  Cl  Uk)  for  1  <  j  <  l.  Summing  on  /,  we  see  that  K"  is 
the  direct  sum  of  all  Uj  Cl  Wj .  Each  of  I)  and  D’  is  scalar  on  U,  Cl  Wj ,  and  (a)  follows 
by  translating  this  result  into  a  statement  about  matrices. 

The  matrices  N  —  ^  and  N'  —  commute,  and  both  have  N  uniquely 

as  Jordan  form.  If  C  were  to  exist  with  C~1NC  and  C~lN'C  both  in  Jordan  form, 
we  would  have  C~lNC  —  C~1N'C  and  N  =  N' ,  contradiction.  This  answers  (b). 

6.  If  E  is  the  projection  of  V  on  U  along  W,  then  each  member  of  U  is  an  eigen¬ 
vector  with  eigenvalue  1,  and  each  member  of  W  is  an  eigenvector  with  eigenvalue  0. 
The  union  of  bases  of  U  and  W  is  then  a  basis  of  eigenvectors  for  E,  and  (a)  follows 
from  Theorem  5. 14.  In  view  of  Proposition  5.15,  two  projections  are  given  by  similar 
matrices  if  and  only  if  they  have  the  same  rank. 

7.  For  (a),  EF  —  F  implies  image  F  C  image  E,  which  implies  EF  —  F. 
Reversing  the  roles  of  E  and  F,  we  see  that  FE  =  E  if  and  only  if  image  £  C 
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image  F. 

For  (b),  EF  —  E  implies  kerf  C  kerf,  while  FE  =  F  implies  kerf  C  kerf. 
So  E  F  —  E  and  FE  =  F  implies  ker  E  —  ker  f .  Conversely  if  ker  f  C  ker  E ,  then 
EF  =  E  on  ker  f  and  EE  =  E  on  image  f ;  so  E F  —  E .  Reversing  the  roles  of  E 
and  f ,  we  see  that  ker  E  C  ker  f  implies  FE  =  F . 

8.  If  EF  =  ff,  then  (EF)2  =  EFEF  =  E(FE)F  =  E(EF)F  =  f2f2  = 
f  f .  So  E  F  is  a  projection.  This  proves  (a). 

For  (b),  let  E  =  ^ ^  j  and  f  =  (*  *).  Each  is  a  projection,  and  EF  —  F,  so 
that  EF  is  a  projection.  Flowever,  FE  —  E.  Since  E  ^  f ,  EF  ^  FE. 

9.  If  E  is  a  projection,  then  U  =  2E  —  /  has  U2  —  4E 2  —  4f  +  I  = 
4E  —  4E  +  I  —  /;  so  U  is  an  involution.  If  U  is  an  involution,  then  E  =  j([/  +  I) 
hasf2  =  \(U2  +  2U  +  /)  =  j(/  +  2t/  +  /)  =  j  ((/  +  /)  =  E.  So  f  is  a  projection. 
The  two  formulas  £/  =  2E  —  I  and  E  —  \(U  +  I)  are  inverse  to  each  other. 

10.  Apply  Theorem  5.19,  and  take  U  to  be  the  primary  subspace  for  the  prime 
polynomial  X  and  W  to  be  the  sum  of  the  remaining  primary  subspaces.  Then  (i), 
(ii),  and  (iii)  are  immediate  from  the  theorem.  For  (iv),  let  Uj  be  the  primary  subspace 
for  some  other  prime  polynomial  P(X).  The  theorem  shows  that  L  L,  has  a  power 
of  P (X)  as  minimal  polynomial.  Since  X  does  not  divide  P(X),  Problem  2  shows 
that  L\u  is  invertible.  Hence  L\u  is  invertible  on  the  direct  sum  of  the  Uj’s  other 
than  the  one  for  the  polynomial  X. 

11.  Let  V  —  U\  ®  •  •  •  ®  Uic  be  the  primary  decomposition,  with  U\  corresponding 
to  the  prime  X.  By  (ii)  and  Theorem  5.19e,  U  =  (U i  Cl  U)  ®  •  •  •  ©  (Uk  fl  U)  and 
similarly  for  W.  Then  Uj  HU  =  0  for  j  >  2  by  (iii),  and  hence  U  C  U\ .  By  (iv), 
Cl  n  W  =  0,  so  that  W  C  U2  ®  •  •  •  ©  Uk.  By  (i),  U  =  U\  and  W  =  U2  ®  •  •  •  ®  Uk. 

12.  Part  (a)  is  immediate,  and  a  basis  for  (b)  consists  of  the  union  of  bases  for  the 
individual  Uj ’s.  Part  (f)  is  evident. 

For  (d)  and  (e),  since  D  is  a  linear  combination  of  the  f  ,  ’s  and  each  f ;  is  a 
polynomial  in  L,  D  is  a  polynomial  in  L ,  say  D  =  P(L).  Then  N  —  L  —  P(L) 
commutes  with  L ,  and  this  is  (d).  Applying  the  division  algorithm  to  P,  we  have 
P  —  AM  +  R  with  R  —  0  or  deg  R  <  deg  M.  Evaluating  at  L  gives  D  —  P(L)  = 
A(L)M(L)  +  R(L)  =  R (L)  since  M(L )  =  0.  Thus  R  will  serve  in  place  of  P  if 
deg  P  >  deg  M.  This  proves  the  existence  in  (e)  of  the  polynomial  for  D.  Since 
N  —  L  —  D.  N  is  a  polynomial  in  L,  and  again  we  can  take  this  polynomial  to  be 
0  or  to  have  degree  <  deg  M.  This  proves  the  existence  in  (e)  of  the  polynomial  for 
N .  For  uniqueness  if  Pi  is  a  second  polynomial  that  yields  D.  then  0  =  D  —  D  = 
P(L)  —  P\(L)  shows  that  P  —  P\  is  a  multiple  of  M ,  and  the  condition  on  the 
degrees  of  P  and  Pi  forces  P  —  Pj  =  0.  So  P  is  unique.  Similarly  the  polynomial 
representing  N  is  unique.  This  completes  the  proof  of  uniqueness  in  (e). 

If  Qj (X)  is  the  polynomial  (X  —  Xo)lj ,  then  NlJ  —  ( L  —  D)lJ  —  Qj(L)  on  Uj,  and 
Theorem  5.19f  shows  that  Qj(L )  is  0  on  Uj.  Therefore  a  power  of  A  is  0  on  each 
Uj,  and  N  is  nilpotent.  This  proves  (c).  Part  (g)  now  follows. 
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13.  Each  eigenvector  of  D  must  lie  in  some  Uj  by  Theorem  5.19e.  If  V,  is  the 
eigenspace  of  D  with  eigenvalue  cy,  it  follows  that  Vi  C  UjQ)  for  some  j  =  j(i). 
Thus  each  Uj  is  the  sum  of  full  eigenspaces  of  D.  Property  (d)  forces  N  to  carry  V, 
into  itself.  By  (c),  ( L  —  D)"  is  0  on  V,  for  n  —  dim  V;  hence  (L  —  Cil)n  is  0  on 
Vj.  Since  V,  C  Uj,  (L  —  Xjl)n  is  0  on  Uj.  Application  of  Problem  10  to  L  —  c,  1 
shows  that  L  —  Xj  /  is  nonsingular  on  V,-  if  c,  ^  Xj,  in  contradiction  to  the  fact  that 
(L  —  Xjl)n  is  0  on  Uj ,  and  therefore  c;  =  Xj.  The  conclusion  is  that  Vi  —  Ujq),  and 
the  desired  uniqueness  follows. 

A  slightly  shorter  argument  is  available  if  one  takes  the  constructive  proof  of 
existence  of  a  decomposition  L  =  D  +  N  as  known,  so  that  Problem  12  is  available 
for  that  decomposition.  If  there  is  a  second  decomposition  L  —  D'  +  N'  satisfying 
(a)  through  (d),  then  D'  and  N'  commute  with  L  and  hence  with  all  polynomials  in 
L.  Thus  they  commute  with  D  and  N.  The  equality  L  —  D  +  N  =  D'  +  N'  implies 
D  —  D'  —  N'  —  N .  Problem  5a  shows  that  D  —  D'  has  a  basis  of  eigenvectors,  and 
N'—N  is  nilpotent  because  the  commutativity  of  N  and  N'  shows  that  the  the  Binomial 
Theorem  applies,  in  view  of  Problem  15  in  Chapter  I.  Thus  D  —  D1  =  N'  —  N  —  0. 

14.  In  (a).  Lemma  5.22  says  that  det( A/  —  N ')  —  X"  .  Consequently 
det(A7  -  (N1  +  cl))  =  det((A  -  c)I  -  N')  =  (X  -  c)n' . 

In  (b),  form  the  primary  decomposition  of  L  as  in  Theorem  5.19,  and  let  notation 
be  as  in  Problem  12.  On  the  subspace  Uj,  which  is  carried  to  itself  by  L,  L  —  D  +  N 
acts  as  Xjl  +  N,  and  the  characteristic  polynomial  on  that  subspace  is  (A  —  A/)"-', 
by  (a).  On  the  whole  space  V,  the  characteristic  polynomial  of  L  is  the  product 
of  the  contributions  from  each  Uj,  since  as  a  consequence  of  Proposition  5.11,  the 
determinant  of  a  block  diagonal  matrix  is  the  product  of  the  determinants  of  the 
blocks.  Therefore  L  has  characteristic  polynomial  YYj=\  (A  —  Xj)n> ,  and  this  matches 
the  characteristic  polynomial  of  D. 


15.  The  characteristic  polynomial  is  A2  —  2A  +  1  =  (A  —  1).  Since  A  —  /  /  0, 
the  minimal  polynomial  is  (A  —  l)2  rather  than  A  —  1.  Thus  the  Jordan  form  is 


J  = 


=  (  o  J )  •  Solving  shows  that  ker(  A  —  I)  consists  of  the  multiples  of  ^  1  ^ . 

^  as  the  first  column  of  C,  and  solve  (A  —  I) X  =  ^  2  j  to  get  A  =  ^  |  j 


Use 


as  one 


answer  for  the  second  column.  Then  C  = 
checks  that  C-1  AC  —  J . 


=(JD'c"=(-n)' 


and  one  readily 


16.  The  characteristic  polynomial  is  P( X)  —  det(A7  —  A)  —  A3.  Thus  A  is 

9  /010\ 

nilpotent,  and  in  fact  A~  —  0.  Then  J  —  I  o  o  o  J,  and  the  computation  proceeds  as 

\0  0  0/ 

/  4  1-1 

in  Example  1  in  Section  7,  yielding  C  —  I  -8  o  4 

V  8  o  o 

17.  The  characteristic  polynomial  is  (A  —  2)6(A  —  3)  by  inspection.  Thus  there 
is  a  primary  subspace  for  A  —  2  with  dimension  6  and  a  primary  subspace  for  A  —  3 


V 

f  0  0 

5 

J  and  C-1  = 

/ 

l 

4 

1  o  1 
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with  dimension  1.  For  the  Jordan  form  let  Kj  —  ker(A  —  2 1)J .  By  raising  A  —  21 
to  powers  and  row  reducing,  we  see  that  dim  Kj,  =  6,  dim  AT?  =  5,  and  dim  K\  =  3. 
We  do  not  have  to  proceed  beyond  A"?  since  we  have  reached  the  full  dimension  6  of 
the  primary  subspace  for  X  —  2.  Therefore  the  number  of  Jordan  blocks  for  X  —  2  of 
size  >3  is  6  —  5  =  1,  of  size  >  2  is  5  —  3  =  2,  and  of  size  >  1  is  3.  Hence  there  is 
one  block  of  each  size  1,  2,  and  3,  and 


Solving  (A  —  3 1)X  =  0,  we  find  that  the  eigenvectors  for  eigenvalue  3  are  the 
multiples  of  (5,  2,  2,  3,  2,  1,  1).  Thus  this  vector  can  be  taken  to  be  the  last  column 
of  C. 

The  next  step  is  to  express  K\,  A"?,  and  A"?  explicitly  in  terms  of  parameters  by 
using  the  standard  solution  procedure  for  systems  of  homogeneous  linear  equations. 
The  result  is  that 


Following  the  method  of  Example  1  in  Section  7,  we  choose  W?  such  that  A'?  = 
AC2  ®  W?,  and  then  we  form  U\  —  (  A  —  2/) ( W2) : 


We  choose  W 1  such  that  AS  =  K\  ©  U\  ®  Wi,  and  we  form  Uq  —  (A  —  2I)(U\  +  Wi): 


Finally  we  choose  Wo  such  that  K\  —  Kq  ®  Uq  ®  Wo-  Here  Kq  =  0,  and  we  can 
take  Wo  =  {(0,  *2,  0,  0,  0,  0,  0)}. 
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To  form  C  we  take  a  basis  of  each  Wj ,  apply  powers  of  A — 21  in  turn  to  its  members, 
and  line  up  the  resulting  columns,  along  with  the  eigenvector  for  eigenvalue  3,  as  C: 


C  = 


/2  0  0  1  0  0  5  \ 
0  10  0  0  12 
10  0  0  0  0  2 
0  10  0  10  3 
0  10  0  0  0  2 
0  0  1  0  0  0  1 
\0  0  0  0  0  0  1  / 


18.  In  (a),  if  every  prime-power  factor  of  the  minimal  polynomial  is  of  degree  1, 
then  the  matrix  is  similar  to  a  diagonal  matrix,  and  the  multiplicities  of  the  eigenvalues 
can  be  seen  from  the  characteristic  polynomial.  If  the  minimal  polynomial  is  (X  —  c)2 , 


then  the  matrix  has  to  be  similar  to 


l  o 

o  c  o 
o  o  c 


If  the  minimal  polynomial  instead  is 


(X  —  c)2(X  —  d ),  then  the  matrix  has  to  be  similar  to 


polynomial  is  (X  —  c)  ,  then  the  matrix  has  to  be  similar  to 

other  possibilities. 

^0  100' 

For  (b). 


c  1  0 
0  c  0 

0  0  d 

c  1  c 

0  c  1 
0  0  c 


If  the  minimal 


There  are  no 


and 


both  have  minimal  polynomial  X2  and  charac¬ 


teristic  polynomial  X4,  but  they  are  not  similar  because  their  ranks  are  unequal. 


19.  If  the  diagonal  entries  are  c  and  N  denotes  the  strictly  upper-triangular  part, 
then  Jk  —  (cl  +  N)k  =  ^2j=o  [k)ck~  '  N' .  The  term  from  j  =  1  is  not  canceled  by 
any  other  term,  and  hence  Jk  is  not  diagonal. 

20.  Choose  J  in  Jordan  form  and  C  invertible  with  J  —  C~1AC.  Then  Jn  = 
CA"C-1  =  CC-1  =  I .  By  Problem  19,  every  Jordan  block  in  J  is  of  size  1-by-l. 
Thus  A  is  similar  to  a  diagonal  matrix  D ,  and  each  diagonal  entry  of  D  must  be  an 
nth  root  of  unity.  Any  « -tuple  of  nlh  roots  of  unity  can  form  the  diagonal  entries,  and 
the  corresponding  matrices  are  similar  if  and  only  if  one  is  a  permutation  of  the  other. 

21.  The  minimal  polynomial  has  to  divide  X(X2  —  1)  =  X(X  +  1)(X  —  1). 
Hence  there  is  a  basis  of  eigenvectors,  the  allowable  eigenvalues  being  1,-1,  and  0. 
A  similarity  class  is  therefore  given  by  an  unordered  triple  of  elements  from  the  set 
{1,  —1,0}.  There  are  three  possibilities  for  a  single  eigenvalue,  six  possibilities  for 
one  eigenvalue  of  multiplicity  2  and  one  of  multiplicity  1 ,  and  one  possibility  with 
all  three  eigenvalues  present.  So  the  answer  is  ten. 

22.  If  A2  —  N  and  Nn  —  0,  then  A2'1  =0.  So  A  is  nilpotent  and  A"  =  0.  Since 

Nn~l  /  0,  A2"-2  0.  Therefore  n  >  2 n  —  2,  and  n  —  1. 


23.  If  J  is  of  size  n,  then  the  matrix  C  with  C,\/)  + 1  _/  =  1  for  1  <  i  <  n  and 
Cjj  —  0  otherwise  has  C-1  JC  —  J' . 
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24.  Choose  C  with  C  1  AC  =  J  in  Jordan  form.  Problem  23  shows  that  there 
is  a  block-diagonal  matrix  B  with  B~x  J B  =  J‘ .  Then  B~lC~l  AC B  —  J1  and 
C‘At(C~l)‘  =  J f.  So  B~'C~lACB  =  C’  Af  (C~'Y ,  and  the  result  follows. 

25.  The  matrices  A  and  B  have  A2  =  Br  —  0  and  hence  are  nilpotent.  Since  each 
of  A  and  B  has  rank  2,  dimker  A  =  dim  ker  B  —  2.  The  numbers  dim  ker  Ak  and 
dimker  Bk  being  equal  for  all  k ,  the  two  matrices  have  the  same  Jordan  form  and  are 
therefore  similar. 

26.  If  M(X)  is  the  minimal  polynomial  of  L,  then  M{L)v  —  0.  Hence  M ( X )  is 
inXu.  Then  Proposition  5.8  shows  that  MV(X)  exists. 

27.  The  polynomial  MV(X )  has  to  divide  the  minimal  polynomial  of  L and 
the  latter  has  degree  <  dimPiT).  Hence  degM„(Z)  <  dimP(i>).  If  v,  L(v),  . . .  , 
Ldeg  m„-i  (^j  are  iinearly  dependent,  then  there  is  a  nonzero  polynomial  Q(X)  of 
degree  <  deg  Mv  —  1  with  Q(L)(v)  =  0,  and  that  fact  contradicts  the  minimality 
of  the  degree  of  MV(X).  Hence  they  are  independent,  and  deg  MV(X)  >  dim  V(v). 
Thus  equality  holds,  and  the  linearly  independent  set  is  a  basis.  This  proves  (a)  and 

(b). 

Since  MV(X)  divides  the  minimal  polynomial  of  L |p  ,  which  divides  the  char¬ 
acteristic  polynomial  of  L  |  v  ( ,  and  since  the  end  polynomials  have  degree  dim  V(v) , 
these  three  polynomials  are  all  equal.  This  proves  (c). 

28.  Use  the  ordered  basis  ( Ld~x(v ),  Ld~2(v), ,  L(v),  v). 

29.  Since  P(X)  is  prime  and  does  not  divide  Q{X),  there  exist  polynomials 
A(X)  and  B(X)  with  A(X)P(X)  +  B(X)Q(X)  —  1.  Using  the  substitution  that 
sends  X  to  L  and  applying  both  sides  to  v,  we  obtain  B(L)Q(L)(v )  =  v.  Hence 
V(Q(L)(v))  ^  Viv).  Since  the  reverse  inclusion  is  clear,  the  result  follows. 

30.  In  (a),  the  base  case  of  the  induction  is  that  dim  V  —  deg  P(X),  and  then 
the  result  follows  from  Problem  27.  For  the  inductive  step,  the  same  problem  shows 
that  there  must  be  a  nontrivial  invariant  subspace  U.  Proposition  5.12  shows  that 
the  minimal  polynomial  for  U  and  V/U  is  P(X),  and  induction  shows  that  the 
characteristic  polynomial  for  U  and  V/U  is  a  power  of  P(X).  Proposition  5.11  then 
shows  that  the  characteristic  polynomial  for  V  is  a  power  of  P(X). 

For  (b),  we  induct  on  /,  using  (a)  to  handle  the  case  1  =  1.  For  general  /,  form 
the  invariant  subspace  U  =  ker  P(X)'~l ,  for  which  the  minimal  polynomial  is  some 
PlX)r  withr  <  I.  The  minimal  polynomial  of  V/U  is  certainly  P(X).  By  induction. 
U  and  V/U  have  characteristic  polynomials  equal  to  powers  of  P  ( X) ,  and  Proposition 
5.11  shows  that  the  same  thing  is  true  for  V. 

In  (c),  (b)  says  that  the  characteristic  polynomial  is  of  the  form  P(X)r  for  some 
r.  Then  the  degree  of  the  characteristic  polynomial  is  rd,  where  d  =  deg  P(X). 

32-34.  These  are  proved  word-for-word  in  the  same  way  as  Lemmas  5.23  through 
5.25  except  that  n  is  to  be  replaced  by  l  and  N  is  to  be  replaced  by  P(L). 
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35.  If  Q(X)  is  in  K[X],  we  successively  apply  the  division  algorithm  to  write 

Q  —  AqP  +  Bq  with  deg  Bq  <  deg  P, 

Aq  —  AiP  +  Bi  with  deg  B\  <  deg  P, 

A i  =  A2P  +  B2  with  deg  B2  <  deg  P, 


etc.,  and  then  we  substitute  and  find  that 

Q  =  A0P  +  B0  =  A1P2  +  BiP  +  B0  =  A2P 3  +  B2P 2  +  BXP  +  B0 
=  ■■•  =  AjPj+1  +  BjPi  +  ■  •  ■  +  B2P2  +  BiP  +  B0 

with  each  Bj  equal  to  0  or  of  degree  <  deg  P.  The  fact  that  Wj  C  Kj+\  implies  that 
P7+1(L)( v)  —  0.  Consequently 

V(v)  =  {(BjPJ  +  ■  ■  ■  +  B\P  +  Bq)(L)(v)  I  B,  —  0  or  deg  Bj  <  d  forO  <  i  <  j), 
and  the  given  set  spans  V(v). 

For  the  linear  independence  suppose  that  some  such  expression  is  0  with  not  all 
Bj  ( X )  equal  to  0.  Fix  i  as  small  as  possible  with  B,  (X)  ^  0.  Since  P(L)J+l  (v)  —  0, 
Br(L)P(L)r (v)  is  annihilated  by  P(L)J~'  if  r  >  i.  Application  of  P(L)7-'  to  the 
dependence  relation  yields 

PiLy-^BjiQPiLyiv)  +  ■■■  +  Bi+l(L)P(L)i+1  +  Bi(L)P(Lym  =  0 

and  therefore  also  Bj(L)P(L)J  (v)  =  0.  Since  deg  Bj  <  degP,  Problem  29  shows 
that  P(L)J(v)  —  0.  Therefore  v  is  in  Kj.  Since  Wj  Cl  Kj,  we  conclude  v  —  0, 
contradiction. 

36.  We  show  at  the  same  time  that  it  is  possible  to  arrange  for  each  Uj  and  Wj  to 
be  such  that  Kj  +  Uj  and  Kj  +  Wj  are  invariant  under  L.  We  proceed  by  induction 
downward  on  j.  The  construction  begins  with  U/-\  —  0  and  W/_  1  chosen  such 
that  Kj  —  Kj_  1  ®  W/_ j.  Then  we  have  L(Wi-\)  C  W/_  1  +  X/_  1  and  L(t//_  1)  C 
Ui-i  +  Ki_j.  Select  some  uj,_  11  ^0inW/_i.  If  there  is  a  polynomial  B(X)  ^Owith 
deg  B  <  deg  P  such  that  B(L)(\f  1 ')  is  in  Ki-\,  then  it  follows  from  Problem  29 
and  the  invariance  of  X/_  1  under  L  that  if  1  *  is  in  X/_  1,  contradiction.  So  there  is 
no  such  polynomial,  and  the  vectors  v\l  ",  L(v "), . . . ,  Ld~l(v ")  are  linearly 
independent  with  span  "  such  that  K/-\  +  T ^  11  is  a  direct  sum. 

If  Ki-i  +  T\  11  ^  Kj,  then  we  form  Vj  "  and  "  in  the  same  way.  If 
there  is  a  polynomial  B(X)  /  0  with  degP  <  deg  P  such  that  B(L)(v %  ")  is  in 
K\- 1  +  then  Problem  29  shows  that  is  in  X/_  1  +  P^-",  contradiction. 

We  conclude  that  Ki-\  +  "  +  P9(/  "  is  a  direct  sum.  Continuing  in  this  way. 
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we  obtain  enough  linearly  independent  vectors  to  have  a  basis  for  a  complement 
Wi-i  =  7f_1)  +  r2(/“1)  +  ■  ■  ■  to  K,_ i. 

Now  suppose  inductively  in  the  construction  of  U:  and  Wt  that  j  <  /  —  2 
and  that  Uj+ 1  +  ffy+i  and  Wy+i  +  Afy+i  are  invariant  under  L.  We  define  t/;-  = 
P(L){Uj+ 1  ®  W2+i),  and  the  assumed  invariance  implies  that  [//  +  ATy  is  invariant 
under  L.  We  now  construct  Wj  in  the  same  way  that  we  constructed  W/_  i,  insisting 
that  (C/;-  +  Kj)  fl  Wj  —  0.  If  we  choose  v^}  in  Kj+  \  but  not  Uj  +  Kj,  then  the 
invariance  of  Uj  +  Kj  under  L  implies  that  the  vectors  v\J\  . . . ,  Ld~x(v j7') 

are  linearly  independent  and  their  linear  span  7’1(,  )  is  such  that  Uj  +  Kj  +  T^]  is  a 
direct  sum.  Continuing  in  this  way,  we  obtain  the  required  basis  of  a  complement  Wj 
to  Kj  ®  Uj . 

37.  Problem  36  arranges  that  the  vectors  Z/( v-^)  for  0  <  r  <  d  —  1  and  all  in¬ 

form  a  basis  of  Wj .  We  show  by  induction  downward  for  j  <  l  —  1  that  the  vectors 
Lr  P {L)k (vi  '^k))  for  0  <  r  <  d  —  1,  k  >  0,  and  all  ij+k  form  a  basis  of  Uj.  This 
holds  for  j  =  l  —  1  since  Ui-\  =  0.  If  it  is  true  for  j  +  1,  then  Z7/+ 1  ®  W/+ 1  has  a 
basis  consisting  of  all  U  P(L)k( )  forO  <  r  <  d  —  1,  k  >  0,  and  all  (j+i+t- 
Since  Problem  33  shows  that  P(L)  is  one-one  from  Uj+\  ®  Wj+ 1  onto  Uj,  Uj  has  a 
basis  consisting  of  all  Lr  P(L)k+l  forO  <  r  <  d—  1,  k  >  0,  and  all  ij+i+k, 

i.e.,  all  Lr  P(L)k( vV+k))  for  0  <  r  <  d  —  1,  k  >  0,  and  all  ij+k-  This  completes  the 
induction. 

38.  Problem  35  gives  a  basis  for  the  cyclic  subspace  generated  by  v^\  Problem 
37  shows  that  the  members  within  Uj  ®  W,  of  the  union  of  these  bases,  as  j  and 
vary,  form  a  basis  of  Uj  ®  Wj ,  and  Problem  34  allows  us  to  conclude  that  as  i  varies, 
we  obtain  a  basis  of  V. 

39.  Because  of  the  linear  independence  proved  in  Problem  38,  the  left  side  of  the 
formula  in  question  equals  the  number  of  vectors  v-  in  any  Wj-  with  k  >  j,  which 
equals  '}Zk>  j  (dim  Wk)/d.  Iterated  application  of  Problem  33  gives 

dim  Kj+ 1  —  dim  Kj  =  dim  Uj  +  dim  Wj  =  dim  Uj+\  +  dim  Wj+\  +  dim  Wj 
=  •  •  •  =  T,k>j dim  w*, 

and  the  result  follows. 

40.  The  minimal  polynomial  for  any  cyclic  subspace  must  divide  the  minimal 
polynomial  for  V  and  hence  must  be  a  power  of  P(X).  Problem  28  shows  that  the 
restrictions  of  L  to  any  two  cyclic  subspaces  with  the  same  minimal  polynomial  are 
isomorphic.  Hence  the  decomposition  into  cyclic  subspaces  will  be  unique  up  to 
isomorphism  as  soon  as  it  is  proved  that  the  number  of  cyclic  direct  summands  with 
minimal  polynomial  of  the  form  P(X)k  with  A:  >  j+l  equals  (dim  K/+  \  —dim  Kj)/d. 

Suppose  that  V  is  the  direct  sum  of  cyclic  subspaces  C, ,  with  Vj  as  the  generator 
of  Cj.  Since  each  C,  is  invariant  under  L,  each  K,  is  the  direct  sum  of  the  subspaces 
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K,  n  C, .  Thus 


dim  Kj+\  —  dim Kj  —  (dim(A'/+i  fi  C,)  —  dim(A';-  fi  C/)). 

i 


If  P(X)k  is  the  minimal  polynomial  of  Ci,  it  is  enough  to  show  that  the  right  side 
of  this  displayed  formula  equals  d\fk>j+\  and  equals  0  if  A-  <  j.  By  Problem 
35,  Ci  has  a  basis  consisting  of  all  vectors  Lr  P (L)s (v/)  with  0  <  r  <  d  —  1  and 
0  <  s  <  k  —  1.  The  nonzero  vectors  among  the  L'  P{  L)s+J+ 1  (?>,■ )  are  still  linearly 
independent;  these  are  the  ones  with  s  +  j  +  1  <  k,  i.e.,  s  <  k  —  j  —  1.  The  vectors 
U  P (L)s (vi)  that  are  not  sent  to  0  by  P(L)'+l  are  a  basis  of  Kj+ \  Pi  Cj.  These  are 
the  ones  with  s  >  k  —  j  —  1 .  This  is  the  full  basis  of  C,  if  j  +  1  >  k,  and  there  are 
d(j  +  1)  such  vectors  if  j  +  1  <  k.  Thus 


dim  Kj+i  D  Ci 


dk  if  j  +  1  >  k, 

d(J  +  1)  if  j  +  l<k. 


Similarly 

dk  if  j  >  k , 

dj  if  j  <  k. 

Subtracting  and  taking  the  cases  into  account,  we  see  that 


dim  Kj  Pi  Cj  — 


dim(A'/+i  fl  Cj)  —  diml/if,-  Pi  C,)  = 


if  j  +  1  <  k, 
otherwise. 


41.  (a)  ^  co"r)’  O5)  (  s”nh/  coshr ) ’  ^  diagonal  matrix  with  diagonal 

entries  edl , . . . ,  ed" . 

42.  Suppose  that  J  has  diagonal  entry  c.  Let  N  be  the  strictly  upper-triangular 
part  of  J .  Then  e,J  =  etcI+tN  =  etcetN .  Here  etN  —  I  +  tN  +  jjt2N2  +  ■■■  + 

since  N"  —  0.  The  powers  of  N  were  observed  to  have  the  diagonal 
of  1  ’s  move  one  step  at  a  time  up  and  to  the  right. 

43.  —(etAv)  =  ( AetA)v  =  A(etAv). 
dt 

44.  Suppose  that  y(t)  is  a  solution.  The  product  rule  for  derivatives  is  valid  in  this 
situation  by  the  usual  derivation.  Hence  4~t{e~,Ay{t))  —  4  (e~tA)y(t)  +  e~‘Ay'(t)  = 
—e~tAAy(t)+e~‘Ay'(t)  —  e~tA(— Ay{t)+y'(t)).  The  right  side  is  0  since  y(t)  solves 
the  differential  equation.  Since  j~t(e~,Ay(t))  —  0,  each  component  of  e~,Ay(t )  is 
constant.  Thus  for  a  suitable  vector  v  of  complex  constants,  e~tAy(t)  —  v,  and  the 
conclusion  is  that  y(t)  =  etAv. 

45.  The  first  formula  follows  by  making  a  term-by-term  calculation  with  the 
defining  series.  Multiplication  of  C  has  to  be  interchanged  with  the  infinite  sum,  and 
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similarly  for  C-1,  but  these  operations  are  simply  the  operations  of  taking  certain 
linear  combinations  of  limits. 

Suppose  that  z(f )  satisfies  ^z(t)  —  (C~l  AC)z(t)  and  z(0)  =  u.  Multiplying 
by  C  gives  ^-Cz(t)  =  ACz(t).  Thus  y(t )  =  Cz(t)  satisfies  ^jy(t)  =  Ay(t )  and 
y  (0)  =  Cz(0)  =  Cu.  We  can  invert  the  correspondence  by  using  C-1 . 

/3  1  OX 

46.  Example  3  in  Section  7  says  that  C-1  AC  —  J  holds  for  J  =  I  o  3  o  j  and 

C  =  Define  u  —  C-  (1)  =  (>=;•)(*)  =  ( =| 

42^43  show  that  the  unique  solution  of  znz(t)  —  Jz(t )  with  z(0)  =  u  is  z(t)  —  e,J u. 
Problem  45  shows  that  the  unique  solution  to  ^  v(f)  =  Ay(t )  with  y(0)  =  Cu  = 

^2^  is  y{t)  =  Cz(r)  =  CetJ u.  By  Problem  42,  this  is 


Chapter  VI 

1.  In  (a),  the  linear  function  tp  :  V  — >  V'  given  by  cp(v)  —  (v,  ■ )  has  kernel 
equal  to  the  left  radical  of  the  bilinear  form,  hence  0.  Therefore  tp  is  one-one, 
and  dim  images  =  dim  V  =  dim  V' .  Since  dimV'  <  oo,  cp  is  onto  V'.  In  (b), 
m  ( v ,  • )  is  a  linear  functional  and  by  (a)  is  of  the  form  (t>,  u )  =  (w,  u)  for  some 
unique  w  depending  on  v.  Set  w  —  L{v).  The  uniqueness  shows  that  L{v\  +  fiz)  = 
L(v i)  +  L(v\)  and  L{cv)  —  cL(v).  Hence  L  is  linear. 

2.  Since  M'  AM  would  have  to  be  nonsingular,  the  only  possibility  would  be 
M’  AM  equal  to  the  identity.  Writing  M~l  as  ^  ‘‘  *  j ,  we  obtain  the  conditions 
a  +  c  =  b  +  d  =  0  and  ab  +  cd  —  1 .  A  check  of  cases  shows  that  these  have  no 
solution. 

3.  Take  M  =  (“]  j). 

5.  Define  ( a  +  bi)w  —  aw  +  bJ(w )  for  a  and  b  real.  The  crucial  property  to 
show  in  order  to  obtain  a  complex  vector  space  is  that  ((a  +  bi)(c  +  di))(w)  = 
( a  +  bi)((c  +  di)w );  expansion  of  both  sides  shows  that  both  sides  are  equal  to 
( ac  —  bd)w  +  (be  +  ad)J(w)  since  J 2  —  —I.  Thus  W  =  Pr  for  a  suitable  V. 

Next  define  ( v ,  w)  =  (J ( v),  w )  +  i(v,  w).  This  is  bilinear  over  R.  It  is  complex 
linear  in  the  first  variable  because  (J(v),  w)  —  (J2(v),  w)  +  i(J(v),  w)  =  —  (v,  u>}  + 
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i(J(v),  w)  =  i(v,  w).  It  is  Hermitian  because  (w,  v )  =  ( J(w ),  v)  —  i(v,  w)  = 
(J2(w),  J{v))  —  i  (w,  v)  =  —{w,  J(v )}  —  i (w,  v)  —  ( J(v ),  w)  +  i  ( v ,  w )  —  (v,  w). 

6.  For  (a),  U  isotropic  implies  U1-  23  U.  If  v  is  a  vector  in  U1-  but  not  U,  then 
U  ®  Ki>  is  isotropic.  Maximality  thus  implies  that  U1  —  U.  Proposition  6.3  says  that 
dim  V  —  dim  U  +  dim  U2-,  and  we  conclude  that  dim  V  =  2  dim  U.  So  dim  U  —  n. 

The  proof  of  (b)  goes  by  induction  on  the  dimension,  the  base  case  being  dimen¬ 
sion  2,  where  there  is  no  problem.  Assuming  the  result  for  spaces  of  dimension  less 
than  dim  V,  let  Si  be  maximal  isotropic  in  V,  so  that  dim  ,S'|  =  j  dim  V  by  (a).  Fix  a 
basis  {vi, . . . ,  vn}  of  S\.  Choose  u \  with  {v\,u\)  —  1;  this  exists  by  nondegeneracy. 
Put  U  =  Ki>i  ©  Wm\ .  Then  ( • ,  •  )\UxU  is  evidently  nondegenerate,  and  Corollary 
6.4  shows  that  V  —  U  ©  U^.  Certainly  Si  Cl  is  an  isotropic  subspace  of  U^. 
It  contains  the  n  —  1  linearly  independent  elements  Vj  —  (vj,  u\)v\  for  2  <  j  <n 
and  hence  has  dimension  >  n  —  1.  Therefore  it  is  maximal  isotropic.  By  induction, 
there  is  a  maximal  isotropic  subspace  T  of  U1-  with  (Si  IT  U -*-)  IT  T  —  0.  Put 
S2  =  T  ©  Kki.  Since  (u  \ ,  U -1)  =  0,  (wi,  T)  —  0.  Therefore  S2  is  isotropic,  hence 
maximal  isotropic  in  V .  Suppose  that  the  element  t  +  cu\  of  S2  lies  in  Si.  From 
(m,  t  +  cit\)  —  0,  ui  e  U,  t  e  U and  (t>i,  uf)  —  1,  we  obtain  c  —  0.  Then  t  +  ci>i 
lies  in  (Si  IT  U^)  IT  T,  which  is  0.  We  conclude  that  Si  IT  S2  =  0. 

For  (c),  if  ( • ,  S2)  is  the  0  function  on  Si,  then  the  fact  that  Si  is  maximal  isotropic 
implies  that  .s'2  =  0.  Therefore  the  mapping  ^2  ' — ►  ( • ,  ^2}  |s  is  one-one.  A  count  of 
dimensions  shows  that  it  is  onto  S[ . 

In  (d),  choose  any  basis  {pi, ....  pn }  of  Si,  and  let  {q\,  . . . ,  qn }  be  the  dual  basis 
of  Sj ,  which  has  been  identified  with  S2  by  (c). 

7.  In  (a),  first  suppose  that/?  :  Us  -+  V  is  given.  Then  his  is  in  Homic(t/.s,  V ), 

and  the  map  from  left  to  right  may  be  taken  to  be  h  1— >  {/ifvjves-  Next  suppose  that 
hs  :  Us  V  is  given  for  each  s.  Then  the  universal  mapping  property  of  Us 
supplies  h  :  Us  — »•  V  with  his  =  hs  for  all  s.  The  map  from  right  to  left  may  be 

taken  as  h*  h.  These  two  maps  invert  each  other. 

In  (b),  first  suppose  that  hs  :  U  —>  Vs  is  given  for  each  s.  Then  the  universal 
mapping  property  of  the  direct  product  produces  h  :  U  — >•  ]~[s  Vs.  The  map  from 
right  to  left  may  be  taken  as  {/;  s  }  SS5  i->  h.  Next  suppose  that  h  :  U  -*  ]~[s  is  given. 
Then  psh  is  in  Hom^lf/,  Vf)  for  each  s  e  S.  Consequently  the  .S'-tuple  {hs}ses  is  in 
Horned/,  Vf).  Then  the  map  from  left  to  right  can  be  taken  as  h  i—>  {psh}ses- 
These  two  maps  invert  each  other. 

For  (c),  we  treat  (a)  and  (b)  separately.  In  the  case  of  (a),  take  S  countably  infinite 
with  each  Us  =  K  and  with  V  —  IK.  Then  HomK(®ses  Us,  V)  has  uncountable 
dimension  and  ©ses  HomK(t/s,  V)  has  countable  dimension. 

In  the  case  of  (b),  take  S  to  be  countably  infinite  with  each  Vs  =  IK  and  with 
U  —  Vs.  Each  member  of  Homr  (6'\  V\„ )  has  its  values  in  Vj0,  and  hence  each 
member  of  ®s  Homr  (C,  O  has  its  values  in  finitely  many  Vs.  On  the  other  hand, 
the  identity  function  from  U  into  0S  Vs  is  in  HomufC,  0  ys)  and  takes  values  in 
all  V/s. 


Chapter  VI 


653 


8.  For  (a),  we  have  gi(g2(x))  =  gi(g2xg'2)  =  g\g2Xg2g\  =  (g\g2)x(g\g2)t  = 
(g\g2)(x).  If  JC  is  alternating,  then  ( gxg‘)‘  =  gx*  g‘  =  - gxg and  (gxg')  a  = 

j,k  gijXjkgik  —  12j<k  §ijxjkgik  +  Hj>k  gijxjkgik  =  Hj<k  Sij(xjk  ~  Xjk)gik  —  0; 
hence  gxg1  is  alternating.  If  x  is  symmetric,  then  (gxg1)1  —  gxr gr  —  gxg1 ,  and  gxg1 
is  symmetric. 

For  (b),  certainly  x  and  gxg1  have  the  same  rank  if  g  is  nonsingular.  Theorem 
6.7  shows  that  an  alternating  matrix  x  can  be  transformed  by  some  nonsingular  g  to 
a  matrix  gxg'  that  is  block  diagonal  with  k  blocks  of  the  form  ^  ^  ^ ,  where  2k  is 

the  rank,  followed  by  0’s  down  the  diagonal.  This  proves  that  any  two  alternating 
matrices  of  the  same  rank  lie  in  the  same  orbit.  It  also  gives  an  example  of  a  matrix 
in  each  orbit. 

For  (c),  certainly  x  and  gxg'  have  the  same  rank  if  g  is  nonsingular.  The  Principal 
Axis  Theorem  (Theorem  6.5)  shows  that  any  symmetric  matrix  over  C  can  be  trans¬ 
formed  by  some  nonsingular  g  to  a  matrix  gxg'  that  is  diagonal,  say  with  diagonal 
entries  d\, ...  ,dn.  We  may  assume  that  d\ .  . . . ,  dj,  are  nonzero  and  the  others  are  0. 

Taking  h  to  be  the  diagonal  matrix  with  diagonal  entries  (dl  , ...  ,dk  ,  0 . 0) 

and  forming  higxg^h* ,  we  obtain  a  diagonal  matrix  in  the  same  orbit  whose  first 
k  diagonal  entries  are  1  and  whose  other  diagonal  entries  are  0.  As  k  varies,  these 
matrices  have  different  ranks  and  hence  lie  in  different  orbits.  They  provide  examples 
of  matrices  in  each  orbit. 

9.  In  (a),  the  formulais  Tyv  (  JT  (mJ  ®  i>,))(m)  =  JT  u'j(u)vi,  and  we  may  assume 
that  {n,  }  is  linearly  independent.  If  this  is  0  for  all  u,  then  the  linear  independence 
of  the  Vi’s  implies  that  u'Au)  =  0  for  all  i  and  all  u.  Then  all  u.  are  0,  and  hence 
J2i(u'i  ®  vi )  —  0-  Thus  Tuv  is  one-one. 

In  (b).  Problem  7a  shows  that  it  is  enough  to  handle  U  =  K.  Thus  we  are  to 
show  that  K'  ®k  V  maps  onto  Homjj(K,  V)  =  V .  One  member  of  K'  is  the  identity 
function  1'onK,  and  V  ®  V  certainly  maps  onto  V . 

For  (c),  if  U  —  V  and  if  dim  U  is  infinite,  every  member  of  the  image  of  Tim  has 
finite  rank,  but  \Aoms(U ,  U )  contains  the  identity  function,  which  has  infinite  rank. 

In  (d),  let  L  :  U\  —*■  U  and  M  :  V  V i  be  given,  so  that  F (L.  M )  carrying 

(U'  ®k^)  to  (U[  ®K^t)is  given  by  F(L,  =  L’(u')®  M(v)  and  G(L,  M) 

carrying  Hornet/,  V)  to  HomK(f/i,  Vi)  has  (G(L,  M)(tp))(it\)  =  M(tp(L(u i)) 
Then 


TUlVlF{L,  M)(u  ®  v)(u i)  =  TUlVl  (L’{u')  ®  M(v))(u\) 

=  V  {u'){u\)M{v)  —  u!  (L(u\))M(v), 

G(L ,  M)Tuv{u  ®  v)(u\)  =  M((Tuv(u'  0  v))(L(ux))) 

—  M(u  (L(u\))v)  =  u'(L(ui))M(v). 

The  right  sides  are  equal,  and  hence  {Tyv}  is  a  natural  transformation. 

In  (e),  the  answer  is  no  because  the  maps  Tyv  need  not  be  isomorphisms,  according 
to  (c). 
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10.  To  see  that  ^(L1)  is  a  vector  space,  one  has  to  verify  that  (/  +  l')tp  =  l<p  +  ftp, 
l(<p  +  cp ')  =  l<p  +  l<p' ,  and  (ll')(p  =  l(}' tp),  and  these  are  all  routine.  If  /x  is  in 
HomK(£,  F),  then  lP(/x)  :  HomK(L,  E)  — >■  HomK(L,  F)  has  to  be  given  by  left- 
by-pt,  and  the  key  step  is  to  show  that  lP(/x)  is  L  linear,  not  merely  K  linear.  For  q> 
in  HomK(L,  E)  and  /,  /'  in  L,  we  have  ('Y  (pt)(l(p))(l')  =  pb((l<p)(l'))  =  fx(tp(ll'))  — 
(ty  (fx,)<p)(ll')  =  (/('F(/x)<p))(f).  Hence  ty(n)(l(p)  =  l  ('Y  (ptfcp)  as  required.  It  is 
routine  to  check  that  T(  I )  =  1  and  that  //  — >  T(/x)  respects  compositions,  and 
hence  T*  is  a  functor. 

11.  Let  T  =  (tq, . . . ,  v„)  be  an  ordered  basis  of  E,  A  =  (uq, . . . ,  wm)  be 
an  ordered  basis  of  F,  and  A  =  [  A  ij  ]  be  the  matrix  of  L  in  these  ordered  bases. 
Put  Fr  =  (t>i,  ivi, . . . ,  vn,  ivn)  and  Ar  =  (uq,  iw\, . . . ,  wm,  iwm).  Then  the 
matrix  of  Lr  in  these  ordered  bases  is  obtained  by  replacing  A,j  by  the  2-by-2  block 

(Re  Ajj  —  Im Ajj  \ 

Im  Ajj  R eAij  J ' 

12.  Let  Fj  =  (« i, . . . ,  um )  and  Aj  =  (v\,  ,  v„),  and  put 

=  (U\  ®  t>i ,  Ml  ®  l>2,  .  .  .  ,  Ml  ®  Vn ,  M2  ®  Vi,  .  .  .  ,  U2  ®  Vn,  .  .  .  ,  M,„  ®  Vn). 

Form  Q2  from  the  ordered  bases  F 2  and  A2  similarly.  Members  of  1  are  indexed  by 
pairs  (/,  j)  with  1  <  i  <  m  and  1  <  j  <  n,  and  members  of  Q2  are  indexed  similarly 
by  pairs  (r,  s).  Then  C(r.sUi,j)  =  AriBsj. 

13.  Define  F  to  be  the  vector  space  K£7  ®  KV,  and  let  /  be  the  linear  map 
l  \  F  —r  T(E)  given  by  l(U)  —  Y  and  l(V)  —  X2  +  XY  +  Y2.  Let  L  be  the 
extension  of  /  to  an  algebra  homomorphism  L  :  T  ( F )  — »■  T  ( E )  with  L(  1)  =  1.  The 
subalgebra  in  question  is  the  image  of  L,  and  the  affirmative  answer  to  the  question 
comes  by  showing  that  L  is  one-one.  It  is  enough  to  show  that  the  basis  elements 
consisting  of  all  iterated  products  U1'  ®  VJ1  ®  U’2  ®  •  •  •  ®  VJn  are  carried  by  L 
to  linearly  independent  elements.  The  image  of  this  element  is  homogeneous  of 
degree  0*  +  2  jif),  and  it  is  enough  to  consider  only  those  images  with  the  same 
homogeneity,  i.e.,  with  t(bt  +  2  jk)  constant.  A  failure  of  linear  independence 
would  mean  that  among  these,  the  ones  with  the  highest  total  power  of  X ,  namely 
with  X^=i  2  jk  maximal,  must  cancel  together.  These  terms  are  monomials  with  Yj  4 
factors  of  Y  and  Y  jk  factors  of  X2,  and  all  such  monomials,  being  also  monomials 
in  X  and  Y,  are  linearly  independent. 

14.  Let  t£  :  E  — >■  S(E)  be  the  one-one  linear  map  that  embeds  E  as  S UF)  C 
S(E ),  and  define  ip  similarly.  The  composition  1  /-  <p  is  a  linear  map  of  E  into  the 
commutative  associative  algebra  S  (F),  and  Proposition  6.23b  yields  a  homomorphism 
O  :  S(E )  — »■  S(F)  of  algebras  with  identity  such  that  iptp  —  We  take  as  S(q>), 
and  this  addresses  (a).  Part  (c)  is  part  of  the  construction  of  S  (cp) .  For  (b),  it  is  plain  that 
^(lf)  =  ls(£).  For  compositions,  suppose  that  :  F  — »•  G  is  linear  and  that  S(jr) 
is  formed  similarly.  Proposition  6.23b  says  that  S(j/q>)  is  the  unique  homomorphism 
of  S(E )  into  S(G)  carrying  1  into  1  and  satisfying  LQ-jrtp  —  Sijrtp)  ip.  On  the  other 
hand,  S(iJ/)S((p)  is  another  homomorphism  of  S(E)  into  S(G)  carrying  1  into  1,  and 
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it  satisfies  idf?)  —  Ug'I')<P  =  (S(f)iF)<p  —  S(^)(iFtp)  =  S{i/r)(S(tp)  iE)  = 
( S(\fr)S(<p))iE ■  Therefore  Sipjrtp)  =  S(x/r)S(tp)  by  uniqueness,  and  S  is  a  functor. 

15.  The  homomorphism  <t>  carries  each  Tn(E)  into  itself.  Since  <f>  carries 
commutators  into  commutators,  0(7)  C  7.  Thus  AHT"(E)  Pi  7)  C  Tn(E )  Pi  7. 
Also,  O  commutes  with  the  symmetrizer  operator  and  hence  carries  Sn  ( E)  into  itself. 
We  are  given  the  equation  q <t> (x )  =  ^q(x)  on  all  of  T" (E).  Since  <t>  carries  Sn(E) 
into  itself,  we  can  interpret  this  as  saying  that  0|~  „  is  well  defined,  and  then  all  the 
assertions  in  the  problem  have  been  addressed. 

16.  Fix  an  ordered  basis  and  check  the  result  directly  for  L’s  that  correspond 
to  elementary  matrices.  The  determinant  and  the  scalar  effect  on  /\d'mE (E)  both 
multiply  under  composition,  and  the  result  follows. 

17.  Part  (a)  is  a  consequence  of  uniqueness.  The  formula  for  (b)  is  ct>(g)7)(i))  = 
<t>(g_1u)  for  v  in  K". 

18.  For  (a),  take  A  to  be  the  category  of  commutative  associative  algebras  over  K 
with  identity,  V  to  be  the  category  of  vector  spaces  over  K,  and  T  :  A  ->  V  to  be 
the  forgetful  functor  that  takes  an  algebra  and  retains  only  the  vector- space  structure. 
If  a  vector  space  E  is  given,  then  ( S ,  i)  is  taken  to  be  ( S(E ),  le ),  where  S(E)  is  the 
symmetric  algebra  of  E  and  le  :  E  — »■  T(S( E))  is  the  identification  of  E  with  the 
first-order  symmetric  tensors. 

For  (b),  take  V  again  to  be  the  category  of  vector  spaces  over  K.  Define  A  to  be  the 
category  whose  objects  are  pairs  (A,  F)  in  which  A  is  an  associative  algebra  over  K 
with  identity  and  F  is  a  vector  subspace  of  A  such  that  every  element  /  of  F  has  / 2  = 
0  and  whose  morphisms  <p  e  Morph((A,  F),  (Ai,  7q))  are  algebra  homomorphisms 
tp  :  A  — »■  A’  such  that  cp(F)  C  F\.  The  functor  T :  A —*■  V  is  to  take  the  pair  (A,  F) 
to  F  and  is  to  take  the  morphism  tp  to  <p  |  F  :  F  — »•  F\ .  If  a  vector  space  E  is  given, 
we  take  ( S ,  t)  to  be  ((/\7s,  /\*£),  ie),  where  le  :  E  —>■  /\l E  —  T(/\E,  /\’  (E))  is 
the  identification  of  E  with  the  first-order  alternating  tensors. 

For  (c),  let  the  nonempty  index  set  be  J .  Take  V  =  CJ  and  A  =  C.  The  functor 
T  :  C  CJ  is  the  “diagonal  functor”  taking  an  object  A  to  the  /-tuple  whose  jlh 
coordinate  is  A  for  every  j;  this  functor  takes  any  morphism  tp  e  Morphc(A,  A')  to 
the  /-tuple  whose  /lh  coordinate  is  tp  for  every  j.  The  given  E  is  to  be  a  /-tuple  of 
objects  {Xj}jej,  S  is  to  be  the  coproduct  JJ/sy  Xj,  and  i  :  {Xj}jej  — »■  F(S)  is  to  be 
the  given  /-tuple  {ij}jeJ  of  morphisms  of  Xj  into  X . 

19.  Let  L  be  the  unique  member  of  Morph  y  (.S',  S’)  given  as  corresponding  to 
/  in  Moi'phyi E ,  T(S’)),  i.e.,  satisfying  T( L)i  =  i1.  Similarly  let  L'  be  the  unique 
member  of  Morph^S,  S')  corresponding  to  l  in  Morphy(7i,  !F(S)),  i.e.,  satisfying 

=  i.  Then  L' L  and  I5  are  in  Morph^S,  S)  and  have  J?r(  1  s)t  =  ljr(S)(  =  ( 
and  T(L' L)i  —  (T(L')T(L))l  —  !F(L')(!F(L)i)  —  T(L')t!  =  i.  By  uniqueness, 
I5  =  L'L.  Similarly  LL'  —  ly. 

20.  By  definition,  Ta  satisfies  Ta(L)  —  T(L)i  for  L  e  Morph  ^i.S.  A).  For 
tp  in  Morph_4(A,  A'),  we  are  to  show  that  G(ip)(Ta(L ))  =  TA'(F(tp)(L)).  Sub- 
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stitution  from  the  definitions  gives  G(<p)(Ta(L))  =  T{tp)T(L)L  =  T(tpL)i  and 
TA'(F((p){L))  —  TA'((pL)  —  fF((pL)i.  These  are  equal,  and  hence  {7a}  is  a  natural 
transformation.  Since  each  T a  is  one-one  onto  by  hypothesis,  the  system  { 7  a  }  is  a 
natural  isomorphism. 

21.  The  previous  problem  shows  that  F  is  naturally  isomorphic  to  G  and  that  F' 
is  naturally  isomorphic  to  G.  Hence  F  is  naturally  isomorphic  to  F' .  The  hypotheses 
of  Proposition  6.16  are  satisfied,  and  the  conclusion  is  that  the  object  S  is  isomorphic 
in  A  to  the  object  S'  by  a  specific  isomorphism  described  in  the  proposition. 

22.  Let  E  and  F  be  in  Obj(V),  and  let  <p  be  in  Morphy(  E,  F).  Then  iptp  is  in 

Morphy(7s,  F(S(F))),  and  the  universal  mapping  property  of  (S(E),  le)  produces  a 
unique  <t>  in  Morphy (S (7s),  S(F))  such  that  lF(<t>)iE  —  Lpcp.  We  define  S{tp)  =  3>. 
There  is  no  difficulty  in  checking  that  5(1^)  =  1  $(E)-  Let  us  check  that  if  we  are 
given  also  \[r  in  Morphy(7L  G),  then  S{\lr)S((p)  =  S(irtp).  We  know  that  is 

the  unique  member  of  Morphy (S (7s),  S(G))  satisfying  LQ-^rtp  =  F(S(4t(p))  ie-  On 
the  other  hand,  S(\lr)S((p )  is  another  member  of  Morphy  (S(E).  S{G)),  and  it  satisfies 

igOm  =  ug*)<p  =  (hsw)  iF)<p  =  ns(fmF<p)  =  nswmnsm  iE)  = 

(T(SW)mS(<p)))iE  =  HS{f)S(<p))LE.  Therefore  S(f<p)  =  S(f,)S(<p)  by 
uniqueness,  and  S  is  a  functor. 

23.  Pfaff(7)  =  1  because  the  only  nonzero  term  comes  from  r  =  1. 

24.  The  terms  in  which  a  contains  a  1 -cycle  are  each  0  because  the  diagonal 

entries  of  X  are  0.  The  remaining  terms  in  which  a  contains  some  cycle  of  odd 
length  will  be  grouped  in  disjoint  pairs  that  add  to  0.  If  such  a  a  is  given,  choose 
the  smallest  label  l, ...  ,2n  that  is  moved  by  a  cycle  of  odd  length  within  a ,  and  let 
r  be  that  cycle.  Let  a'  be  the  product  of  r-1  and  the  remaining  cycles  of  a.  The 
resulting  unordered  pairs  {er,  a'}  are  disjoint.  For  the  indices  i  moved  by  r,  Xj^a)  = 
xi, r (i)  while  =  Xj  r- 1(,-)  =  Then  xi,a(i)  =  xi,r(i) 

and  we  obtain  *‘>'(0  =  Ur^i  xi.r~l(i)  =  (— !)length  T  ]lr(0#/  xr~Hi),i  = 

(_l)lengthrn Ur^i  xiMH  =  ~  Umfi  xiMi )• 
r(i)  =  i.  then^x,)  =  xi.a'(i)-  Thus  f],  xi,n(i)  =  ~Y\ixi,o'(i)-  Since  sgna  = 
sgn  a',  the  terms  for  a  and  a '  sum  to  0. 

25.  If  a  is  good,  let  Ao  consist  of  the  smallest  index  in  each  cycle  of  a,  let  A  be 
the  union  of  all  a2k(Ao)  for  k  >  0,  and  let  B  be  the  union  of  all  <r2Ar+1(Ao)  for  all 
k  >  0.  Certainly  A  U  B  —  {1,  . . . ,  2n],  a  (A)  =  B,  and  o(B)  —  A.  We  have  to 
prove  that  A  n  B  =  0.  If  the  intersection  is  nonempty,  we  have  a2k  (a o)  =  a2/+1  (a'0) 
for  some  a o  and  «(j  in  Ao.  Possibly  by  increasing  /  by  an  even  multiple  of  the  order 
of  cr,  we  may  assume  that  l  >  k.  Then  <j2l',~k>+la'0  —  a o.  This  says  that  a'0  and  ao 
lie  in  the  same  cycle.  Being  least  indices  in  cycles,  they  must  be  equal.  Then  some 
odd  power  of  a  fixes  ao ,  and  the  cycle  of  a  whose  least  element  is  ao  must  have  odd 
length,  contradiction. 

The  definitions  of  A  and  B  in  terms  of  Ao  are  forced  by  the  conditions  in  the 
statement  of  the  problem,  and  therefore  A  and  B  are  unique. 
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26.  Since  A  U  B  =  {l,...,2n}  and  AD  B  —  0,  we  have  y(cr)z(cr)  = 
n£i  Xi,a(i)-  The  definitions  of  r  and  r'  make  y(cr)  =  s( r)  n*=i  (2*—  l ) , r (2*)  and 
z(cr)  =  s'( r')  n*=i  xT'(2k-i),T'(2k)-  The  construction  has  made  the  integers  x(2k  —  1) 
increasing  and  has  made  the  inequalities  r(2 k  —  1)  <  r(2 k)  hold,  and  similarly  for 
x' .  This  proves  the  desired  equality,  apart  from  signs. 

27.  The  previous  problem  shows  that  (sgn  cr)  Y\jZ l  xi,rr(i)  equals 

n  n 

(sgna)s(x)s'(x')  |""J  JCr(2*— l),r(2*)  Y\xx'(2k-l),r'(2k)- 

k= 1  A-l 


Thus  we  want  to  see  that 

(sgno')5(T)^,(T/)  =  (sgnr)(sgnr').  (*) 

In  proving  (*),  we  retain  the  step  in  which  factors  xij  of  v (cr )  and  z(cr)  are  replaced 
by  Xji  with  a  minus  sign  if  j  <  i,  but  we  may  disregard  the  step  in  which  the  factors 
are  then  rearranged  so  that  r  and  r'  can  be  defined.  In  fact,  this  rearranging  does  not 
affect  the  signs  of  r  and  x' .  The  reason  is  that  if  p  is  in  &n  and  if  p  in  ©2„  is  defined 
by  p(2k  —  1)  =  2 p(k)  —  1  and  p(2k)  =  2 p(k),  then  sgnp  =  +1;  it  is  enough  to 
check  this  fact  when  p  is  a  consecutive  transposition,  and  in  this  case  p  is  the  product 
of  two  transpositions  and  is  even. 

Turning  to  (*),  we  first  consider  the  case  in  which  a,  when  written  as  a  disjoint 

product  of  cycles,  takes  the  integers  1 . 2 n  in  order.  In  this  case  we  compute 

directly  that  r  =  1,  that  s(x)  involves  no  sign  changes,  and  that  r  '  is  the  product  of 
cycles  of  odd  length,  with  an  individual  cycle  of  x'  permuting  cyclically  all  but  the 
last  member  of  a  cycle  of  cr.  Thus  r'  is  even.  In  the  adjustment  of  factors  of  z(cr), 
one  minus  sign  is  introduced  because  of  each  cycle  in  a  and  comes  from  the  last  and 
first  indices  in  the  cycle.  Thus  s'(x')  is  (—  l)p,  where  p  is  the  number  of  cycles  in  cr, 
and  this  is  also  the  value  of  sgn  a.  Hence  (*)  holds  for  this  a . 

A  general  a  is  conjugate  in  &2n  to  the  one  in  the  previous  paragraph.  Thus  it  is 
enough  to  show  that  if  (*)  holds  for  cr,  then  it  holds  for  a'  =  (a  a  +  1  )cr («  a  +  1). 
First  suppose  that  a  (a )  ^  a  +  1  and  u(a  +  1)  ^  a.  Then  a  factor  of  y  (cr )  gets 
replaced  with  a  minus  sign  for  a  if  and  only  if  it  gets  replaced  for  cr',  and  similarly 
for  z(cr).  Hence  ,v ( r )  and  s'(x')  are  unchanged  in  passing  from  cr  to  a' .  The  effect 
on  r  and  r',  in  view  of  the  observation  immediately  after  (*),  is  to  multiply  each  on 
the  left  by  (a  fl+l).  Thus  sgn  r  and  sgn  x'  are  each  reversed.  Since  sgn  a  —  sgn  cr', 
(*)  remains  valid  for  a'. 

Now  suppose  that  cr(fl)  =  a+ 1.  We  may  assume  that  a  (a +  1)  ^  a  since  otherwise 
a'  =  a .  To  fix  the  ideas,  first  suppose  that  a  is  in  A.  Then  one  factor  in  y(cr)  is 
xa,a+i ,  and  the  corresponding  factor  of  y  (cr ')  is  xa+\,a-  As  a  result  r  is  unchanged 
under  the  passage  from  a  to  cr',  but  the  number  of  minus  signs  contributing  to  s( r) 
is  increased  by  1  and  s( x)  is  therefore  reversed.  Meanwhile,  r  '  is  left  multiplied  by 
(a  a  +  1),  and  s'(x')  is  unchanged.  Thus  (*)  remains  valid  for  a'.  If  a  instead  is  in 
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B,  then  the  roles  of  r  and  x'  are  reversed  in  the  above  argument,  but  the  conclusion 
about  (*)  is  not  affected.  Finally  suppose  that  er(<7  +  1)  =  a  and  a  (a)  ^  a  +  1. 
Then  the  argument  is  the  same  except  that  the  number  of  signs  contributing  to  ,v  ( r ) 
or  s'(x')  is  decreased  by  1.  In  any  event,  (*)  remains  valid  for  a' . 

28.  What  is  needed  is  an  inverse  construction  that  passes  from  the  pair  (r,  x')  to 
a .  Define  u>  e  &2 n  to  be  the  commuting  product  of  the  n  transpositions  (2k  —  1  2k) 
for  1  <  k  <  n. 

Assuming  for  the  moment  that  we  know  that  some  index  a  is  to  be  in  A,  we  see 
from  the  definitions  above  that  b  —  cr(a)  is  to  be  given  by  b  —  x(bo(x~x(a)))  and  b 
is  to  be  in  B.  If,  on  the  other  hand,  we  know  that  some  index  b  is  to  be  in  B,  then 
<j(b)  is  to  be  given  by  x'(o)(x ,_1  (£>)))  and  is  to  be  in  A.  Thus  the  cycle  within  a  to 
which  a  belongs  has  to  be  given  by  applying  alternately  r«r_l  and  then  x'oox'~{ . 

The  critical  fact  is  that  this  cycle  is  necessarily  even.  In  the  contrary  case  we 
would  have  xcox-1  (x'cox'~lxcox~l)k(a)  =  flforsomeA.  If  A'  =  2/,  then  this  equality 
gives  (xcox~lx'cox'~1)1  (xcox~l)(x,cox'~1  xcox-1)1  (a)  —  a,  which  we  can  rewrite  as 
(xcox~l)(x'cox'~1  xa>x~~iy  (a)  —  (x'cox'~l  xa>x~1)1  (a)',  this  equation  is  contradictory 
since  rcur-1  is  a  permutation  that  moves  every  index.  If  A  =  21  +  1,  then  this  equal¬ 
ity  gives  (T<aT_1)(r,tt>T,-1  ra>r-ly  (r,&>T,-1)(rft)r-1r,&>r,-ly  (rft>r-1)(fl)  =  a  and 
hence  (r' cox' ~1)(xa>x~1xl cox' -1)/ (rft>r-1)(fl)  =  (xa)x~lx'cox'~1)1  (xcox~l)(a)', 
this  equation  is  contradictory  since  rcur-1  is  a  permutation  that  moves  every  index. 

What  we  know  is  that  the  smallest  index  in  each  cycle  is  to  be  in  A.  Thus  we  can 
use  this  process  to  construct  a  from  (r,  x' ),  one  cycle  at  a  time.  For  the  first  cycle 
the  index  1  is  to  be  in  A;  for  the  next  cycle  the  smallest  remaining  index  is  to  be  in 
A,  and  so  on.  We  have  seen  that  the  constructed  a  will  be  the  product  of  even  cycles, 
and  we  can  define  A  as  the  union  of  the  images  of  the  even  powers  of  a  on  the  least 
indices  of  each  cycle,  with  B  as  the  complement.  In  this  way  we  have  formed  a  and 
its  disjoint  decomposition  { 1 , . . . ,  2n\  —  A  U  B,  and  it  is  apparent  that  r  and  r'  are 
indeed  the  permutations  formed  in  the  usual  passage  from  a  to  (r,  r'j  via  (A,  B). 

29.  It  is  enough  to  prove  that  (p\yn  :  V„  —>  V*  is  an  isomorphism  for  every  n. 
We  establish  this  property  by  induction  on  n,  the  trivial  case  for  the  induction  being 
n  —  —  1 .  Suppose  that 

(p\vn-i  ■  V„- 1  — »•  Vjf_!  is  an  isomorphism.  (*) 

By  assumption 

gr"  tp  :  (V„/V„_i)  ->•  (V*/V*_x)  is  an  isomorphism.  (**) 

If  u  is  in  ker(<p|v„),  then  (gr”  tp)(v  +  V,,_i)  =  0  +  V*_v  and  (**)  shows  that  v 
is  in  V„-\.  By  (*),  v  =  0.  Thus  (p\yn  is  one-one.  Next  suppose  that  v#  is  in  V*. 
By  (**)  there  exists  vn  in  Vn  such  that  (gr"  <p)(vn  +  Vn-\)  =  v#  +  V*_v  Write 
tp( vn)  —  v*  +  v*_ j  with  vfl_]  in  V*_l.  By  (*)  there  exists  vn-\  in  Vn-\  with 
(p( vn-i)  =  v„_\-  Then  (p(vn  —  v„-i)  —  v# ,  and  thus  cp\yn  is  onto.  This  completes 
the  induction. 
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30.  We  define  a  product  (Am/Am_ 1)  x  (A„/A„_i)  ->•  Am+n/Am+„_i  by 

{dm  "b  A/n  —  l)(dn  "b  -3/;— !  )  —  dman  ~b  Am_|_;j_  I  ■ 

This  is  well  defined  since  amAn- 1,  Am-\an,  and  Am_iA„_i  are  all  contained  in 
Am+„_i.  It  is  clear  that  this  multiplication  is  distributive  and  associative  as  far  as  it 
is  defined.  We  extend  the  definition  of  multiplication  to  all  of  gr  A  by  taking  sums 
of  products  of  homogeneous  elements,  and  the  result  is  an  associative  algebra.  The 
identity  is  the  element  1  +  A_i  of  Ao/A_i . 

31.  [ x,x ]  =  xx  —  xx  =  0,  and  also  [x,  [y,  z]]  +  [y,  [z,x]]  +  [z,  [x,  >]]  = 
(xyz -xzy-  yzx + zyx)  +  (yzx  -  yxz  -  zxy + xzy )  +  (zxy  -  zyx  -  xyz  +  yxz)  =  0. 

32.  In  (a),  let  x  and  y  be  in  g.  Then  we  have 

[x,  y]1  A  +  A[x,  y]  =  (xy  -  yx)rA  +  A(xy  -  yx) 

=  y‘x‘  A  -  xtytA  +  Axy  -  Ayx 

=  y‘{x'  A  +  Ax)  -  xl(y!A  +  Ay)  +  (x‘  A  +  Ax)y  -  (y‘  A  +  Ay)x  =  0. 
Part  (b)  is  the  special  case  A  —  I . 

33.  Uniqueness  follows  from  the  fact  that  1  and  ((g)  generate  U (g).  For  existence 
let  L  :  T  (g)  — >  A  be  the  extension  given  by  the  universal  mapping  property  of  T  (g) 
in  Proposition  6.22.  To  obtain  L,  we  are  to  show  that  L  annihilates  the  ideal  I".  It  is 
enough  to  consider  L  on  a  typical  generator  of  I" ,  where  we  have 

L(iX  ®iY-iY®iX-  i[X ,  T])  =  L(iX)L(iY)  -  L(iY)L(iX)  -  L{i[X,  T]) 

=  l(X)l(Y)  -  l(Y)l(X)  -  l[X,  Y] 

=  0. 

34.  First  one  proves  the  following:  if  Z\, . . . ,  Zp  are  in  g  and  a  is  a  permutation 

of  {1, - p}\  then  (tZi)  •  •  •  (iZp)  -  (tZCT(i))  •  •  •  ( iZa(p) )  is  in  Up-i(g).  In  fact,  it 

is  enough  to  prove  this  statement  when  a  is  the  transposition  of  j  with  j  +  1 .  In 
this  case  the  statement  follows  from  the  identity  (iZj)(iZ/+ 1)  —  {iZj+i){iZj)  = 
i[Zj,  Zj+ 1]  by  multiplying  through  on  the  left  by  ((Zi)  •  •  •  (iZj-\)  and  on  the  right 
by  UZJ+2)  ■  ■  ■  ( iZp ). 

For  the  assertion  in  the  problem,  if  we  use  all  monomials  with  jm  <  p, 
we  certainly  have  a  spanning  set,  since  the  obvious  preimages  in  T{ g)  span 
®k<p  Tk  Cg)  -  The  result  of  the  previous  paragraph  then  implies  inductively  that  the 
monomials  with  monotone  increasing  indices  suffice. 

35.  We  shall  construct  the  map  in  the  opposite  direction  without  using  the 
Poincare-Birkhoff-Witt  Theorem,  appeal  to  the  theorem  to  show  that  we  have 
an  isomorphism,  and  then  compute  what  the  map  is  in  terms  of  a  basis.  Let 
1],  ( g)  =  0”=o  70 g)  be  the  /7th  member  of  the  usual  filtration  of  T (g).  Define 
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77„(g)  to  be  the  image  in  77(g)  of  T„(q)  under  the  passage  7(g)  ->•  7(g)//".  Form 
the  composition 

T„ (g)  (7„(g)  +  l")/I"  =  77„(g)  77„(g)/74_1 (g). 

This  composition  is  onto  and  carries  7„_  |  (g)  to  0.  Since  7”(g)  is  a  vector- 
space  complement  to  7„_i(g)  in  7„(g),  we  obtain  an  onto  linear  map  7’"(g)  —>■ 
t4(g)/T7„_ 1  (g).  Taking  the  direct  sum  over  n  gives  an  onto  linear  map 

f  ■  T(q)  gr  77(g) 


that  respects  the  grading. 

Let  7  be  the  two-sided  ideal  in  T (g)  such  that  5(g)  =  7 (g) //.  It  is  generated 
by  all  X  <g>  Y  —  Y  ®  X  with  X  and  Y  in  T 1  (g).  Let  us  show  that  the  linear  map 
f  :  7(g)  — >  gr  77 (g)  respects  multiplication  and  annihilates  the  defining  ideal  7 
for  5(g);  then  we  can  conclude  that  f  descends  to  an  algebra  homomorphism 

f  :  5(g)  gr  77(g) 


that  respects  the  grading. 

To  do  so,  let  x  be  in  7'  (g)  and  let  y  be  in  7s(g).  Then  x  +  I"  is  in  77r(g),  and 
we  may  regard  \jr(x)  as  the  coset  x  +  7,_i(g)  +  7"  in  77,  (g)/77,_i(g),  with  0  in 
all  other  coordinates  of  gr  77 (g)  since  x  is  homogeneous.  Arguing  in  a  similar 
fashion  with  y  and  xy,  we  obtain 

fix)  —  x  +  Tr-\  (g)  +  7 ",  f{y)  =  y  +  7s_i(g)  +  7", 
and  f(xy)  =  xy  +  Tr+,s_,  (g)  +  7". 

Since  I"  is  an  ideal,  f(x)f(y)  =  f(xy).  General  members  x  and  v  of  T (g)  are 
sums  of  homogeneous  elements,  and  hence  f  respects  multiplication. 

Consequently  ker  f  is  a  two-sided  ideal.  To  show  that  ker  f  3  7,  it  is  enough 
to  show  that  ker  f  contains  all  generators  X  ®  Y  —  Y  ®  X.  We  have 

f(X  ®  Y  -  Y  <g>  X)  =  A  <g>  Y  -  Y  <g>  Z  +  7j(g)  +  7" 

=  [X,  Y]  +  T\  (g)  +  I" 

=  7)  (g)  +  /", 

and  thus  f  maps  the  generator  to  0.  Hence  f  descends  to  a  homomorphism  f 
as  asserted. 

Finally  we  show  that  this  homomorphism  is  an  isomorphism.  Let  {A,}  be 
an  ordered  basis  of  g.  We  know  that  the  monomials  Xj'  ■  ■  ■  Xf  in  5(g)  with 
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/ 1  <  •  •  •  <  4  and  with  J2m  jm  —  n  form  a  basis  of  .S'”  (g).  Let  us  follow  the  effect 
of  j/  on  such  a  monomial.  A  preimage  of  this  monomial  in  Tn  (g)  is  the  element 

X  i  0  •  *  *  0  X ! !  0  *  *  *  0  Xjk  0  *  *  *  0  Xjk  , 

in  which  there  are  jm  factors  of  X,-m  for  1  <  m  <  k.  This  element  maps  to  the 
monomial  in  U„  ( g )  that  we  have  denoted  by  XJ7  ■  ■  ■  Xf,  and  then  we  pass  to  the 
quotient  Un(Q)/Un-\ (g).  The  Poincare-Birkhoff-Witt  Theorem  shows  that  such 
monomials  modulo  U„- i(g)  form  a  basis  of  U „(q)  /  U n-\  (g).  Consequently  \j/  is 
an  isomorphism. 

36.  This  is  quite  similar  to  Problem  33. 

37.  This  is  similar  to  Problem  34. 

38.  What  is  needed  here  is  a  description  of  a  triple  product  of  generators  in 
terms  of  permuting  indices  and  replacing  repeated  pairs  of  indices  by  a  scalar; 
the  description  does  not  depend  on  the  way  that  the  parentheses  are  inserted  in  a 
triple  product,  and  then  associativity  follows.  The  details  are  omitted. 

39.  Using  the  universal  mapping  property  of  Problem  36,  construct  an  algebra 
homomorphism  L  :  Cl  iff  ( E,  (■ ,  • })  — >  C  carrying  1  into  1  and  extending  the 
mapping  e e,-.  Since  the  e,  ’s  and  1  generate  C,  L  is  onto  C.  Problem  37 
shows  that  dim CliffY/f.  {■,■))<  2",  and  we  know  that  dimC  =  2".  Since  L  is 
onto,  L  must  be  one-one,  as  well  as  onto. 

40.  This  is  similar  to  Problem  35.  The  substitute  for  the  Poincare-Birkhoff- 
Witt  Theorem  is  the  fact  established  by  Problem  39  that  the  spanning  set  of  2" 
elements  in  Problem  37  is  actually  a  basis. 

41.  The  matrix  that  corresponds  to  Xo  has  r  =  —  2. 

42.  To  see  that  7  has  the  asserted  properties,  form  the  quotient  map 
T ( H (V))  —*■  T(V)  by  factoring  out  the  two-sided  ideal  generated  by  Xo  —  1 .  The 
composition  T(H(V ))  — »■  W ( V )  is  obtained  by  factoring  out  the  two-sided  ideal 
generated  by  Xo  —  1  and  all  u  0  v  —  v 0 u  —  ( u ,  v)  1 ,  hence  by  all  u  0  v  —  v 0m  —  (u ,  v )  Xq 
and  by  Xo  —  1.  Thus  7 (II(V))  — »■  W(V)  factors  into  the  standard  quotient  map 
T(H(V ))  — »■  U(H(V))  followed  by  the  quotient  map  of  U(H(V ))  by  the  ideal 
generated  by  Xo  —  1.  By  uniqueness  in  the  universal  mapping  property  for 
universal  enveloping  algebras, Tis  given  by  factoring  out  by  Xo  —  1. 

43.  Let  P  be  the  extension  of  <p  to  an  associative  algebra  homomorphism  of 
U(H(V))  into  A  ThenP(Xo)  =  I  since  ydXo)  =  1.  The  previous  problem  shows 
that  P  descends  to  W(V),  i.e.,  that  there  exists  <p  with  P  —  <p  o 7.  Restriction  to 
V  gives  cp  =  <p  o  l. 

44.  This  is  immediate  from  Problem  42  and  the  spanning  in  Problem  34. 

46.  The  linear  combination  Lj  =  <p(pj)  +  2iup{qj)  of  the  two  given  linear 
mappings  <p(pj)  =  3/3 xj  and  <?(<?/)  =  ntj  replaces  P(x)  in  e~n^ 2 P(x)  by 
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dP/dxj.  Take  a  nonzero  e~%^  P(x)  in  an  invariant  subspace  U,  let  x\l  ■  ■  ■  x„"  be 
a  monomial  of  maximal  total  degree  in  P(x),  and  apply  l\'  ■  ■  ■  L„'  to  e~n^~  P(x) 
to  see  that  is  in  U.  Then  apply  products  of  powers  of  the  various  mj  ’s  to 

this  to  see  that  all  of  V  is  contained  in  U . 

47.  Let  r,  =  p,  +  2nqj,  so  that  (p(ri){Pe~71^)  —  (dP /dxi)e~n^2 .  It  is 
enough  to  prove  that  no  nontrivial  linear  combination  of  the  members  of  the 
spanning  set  q\l  •  •  ■  q ■  •  ■  r„  maps  to  0  under  q>.  Let  a  linear  combination 
of  such  terms  map  to  0  under  <p.  Among  all  the  terms  that  occur  in  the  lin¬ 
ear  combination  with  nonzero  coefficient,  let  (L\, . . . ,  Ln)  be  the  largest  tuple 
of  exponents  (/i, . . . ,  /„)  that  occurs;  here  “largest”  refers  to  the  lexicographic 
ordering  taking  l\  first,  then  I2,  and  so  on.  Put  P (x 1 .... ,  xn )  =  x['  ■  ■  ■  x„" .  If 
(/1, . . . ,  ln)  <  (L\, . . . ,  Ln)  lexicographically,  then  tp(r^  ■  ■  ■  r^")(Pe-7rM2)  —  0. 
Thus  tp(q\'  ■  ■  ■  qn"r\ 1  ■  ■  •  r„)(P  e-7rlxl2)  is  0  if  (li, ... ,  ln)  <  (Li, . . . ,  Ln)  lexico¬ 
graphically  and  equals  x\l  •  •  ■  xkn” L \ !  •  ■  ■  Ln\e~n^~  if  (/ 1 , . . . ,  /„)  =  (L\, . . . ,  Ln). 
The  linear  independence  follows  immediately. 

48.  This  is  similar  to  Problems  35  and  40.  The  key  fact  needed  is  the  linear 
independence  established  in  the  previous  problem. 

52.  In  (a),  for  [a,  b.  c]  to  be  alternating  means  that  [a,  a,  c]  =  [a,  b,  a ]  = 
[b,  a,  a]  =  0.  These  say  that  (aa)c-a(ac)  —  (ab)a-a(ba)  =  (ba)a  —  b(aa)  —  0. 
For  (b),  [a,  a,  c]  =  [/;.  a,  a  ]  =  0  and  the  3-linearity  together  imply  that  [a,  b,  a]  = 
|n,  b,  a]  +  [b,  b,  a]  —  [a+b,  h,  a]  =  [a+b,  b,  a]  +  [a+b,  a ,  a]  —  [a+b,  a+b ,  a]  = 
0. 

53.  For  (a),  (1,0) (c,  z/)  =  (c,  d)  and  (a,  h)(  1,0)  =  (a,b)  directly  from 

the  definition.  Also,  the  definition  (a,  b)*  —  (a*,  —b)  makes  (1,0)*  =  (1,0), 
(a,  bf*  =  (a*,  -b)*  =  (a**,  b )  =  (a,  b),  and  (c,  cl)*(a,  bf  =  (c*,  -b)  = 

(c*a*  —  bd*,  - c**b-a*d )  =  ((c*a* -bd*)* ,  a*d+cb)*  =  ( ac-db *,  a*d+cb )*  = 
((a,b)(c,  d))*. 

For  (b),  (c),  and  (d),  we  observe  that 

((a,  b)(c ,  d))(e,  f)  —  (ac-e—db*-e—f-d*a+f-b*c*,  c*a*-f—bd*- f +e-a*d+e-cb ) 
and 

(a,  b)((c,  d)(e,f))  —  (a-ce— a- fd*-c*f-b*—ed-b*,  a*-c*f+a*-ed+ce-b— fd*-b), 
and  the  results  are  immediate. 

In  (e),  (i)  is  the  usual  construction,  and  (ii)  has  1  =  (1,  0),  i  =  (i,  0),  j  =  (0,  1), 
and  k  =  (0,  —  i),  with  the  identity  of  H  written  now  as  1. 

54.  For(a),  (a,  b)*+(a,  b)  —  («*,  —  b)+(a,  b)  =  (a*+a,  0),  which  is  a  real  mul¬ 
tiple  of  (1,0).  Also,  (a,b)(a,b)*  =  (a,b)(a*,  - b )  =  (aa*  +  bb*,  a*(-b)  +  a*b) 
—  (aa*  +  bb*,  0),  and  this  is  a  positive  multiple  of  (1,  0)  since  aa*  and  bb*  are  >  0 
and  at  least  one  of  them  is  positive.  A  similar  argument  applies  to  (a,  b)*(a,  b). 
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In  (b),  certainly  (a,  b)  is  bilinear  over  R,  the  expression  for  (a,  b)  is  manifestly 
symmetric,  and  we  know  that  (a,  a)  =  aa*  is  >  0  with  equality  only  for  a  —  0. 

In  (c),  we  are  to  prove  that  (xx)y  =  x(xy)  and  ( yx)x  =  y (xx )  in  B.  It  is 
enough  to  prove  the  first  identity  since  application  of  *  to  it  gives  the  second 
identity.  We  use  (c,d)  —  (a,  b )  and  substitute  into  the  displayed  formulas  above 
for  Problem  53.  We  find  that  ((a,  b)(a ,  b))(e,  f )  equals 

(< aa  ■  e  —  bb*  ■  e  —  /  ■  b*a  —  /  •  b*a* ,  a*a*  ■  f  —  bb*  ■  f  +  e  ■  a*b  +  e  ■  ab ) 
and  that  (a,  b)((a,  b)(e,  /))  equals 

(, a-ae-a  ■  fb*  -  a*  f  ■  b*  -  eb  ■  b*,  a*  ■  a*  f  +  a*  ■  eb  +  ae  ■  b  -  fb*  ■  b). 

Taking  into  account  the  associativity  of  A ,  we  see  that  it  is  enough  to  show  that 
(bb*)e  =  e(bb*),  fb*(a  +  a*)  =  (a  +  a*)fb*,  ( bb*)f  =  f(bb*),  and  e(a  +  a*)  = 
(a  +  a*)e.  These  all  follow  from  the  fact  that  A  is  nicely  normed. 

55.  Part  (a)  follows  from  (a)  and  (c)  of  the  previous  problem. 

In  (b),  we  have  (xx*)y  —  (jr (cl  —  x))y  =  cxy  —  (xx)y  =  cxy  —  x(xy)  = 
x(cy  —  xy)  —  v((cl  —  x)y )  =  x(x*y).  The  equality  x(yy*)  =  (xy)y*  follows  by 
applying  *  and  renaming  the  variables. 

In  (c),  use  of  (b)  and  the  definitions  of  the  norm  and  *  gives  |c./h||2fl  = 
(( cib)(ab)*)a  =  ( ab)((ab)*a )  =  ( ab)((b*a*)a )  =  {ab)(b*  (a*  a))  —  \\a\\2((ab)b*) 
—  \\a\\2a(bb*)  =  ||a||2||/7||2a. 

For  (d),  the  norm  equality  of  (c)  implies  that  the  R  linear  maps  left-by-a  and 
right-by-n  are  one-one,  and  the  finite  dimensionality  of  O  allows  us  to  conclude 
that  they  are  onto.  Hence  they  are  invertible. 

For  (e),  use  of  (b)  gives  a(\\a\\~2a*b)  —  ||a||-2a(a*b)  =  ||a||“2(aa*)b  = 
|| a \\~2 1| a || 2b  =  b.  This  proves  the  result  for  left  multiplication,  and  the  argument 
for  right  multiplication  is  similar. 

For  (f),  the  table  is  as  follows,  with  each  entry  representing  the  product  of  the 
element  at  the  left  (the  row  index)  by  the  element  at  the  top  (the  column  index): 


(1.0) 

(i,  0) 

(j.0) 

(k,  0) 

(0,1) 

(O.i) 

(0,j) 

(0,k) 

(i.  0) 

-(1.0) 

(k,  0) 

-(j.0) 

— (0.  i) 

(0, 1) 

-(O.k) 

(0,  j) 

(j.0) 

— (k,  0) 

-(1.0) 

(i,  0) 

— (0,  j) 

(O.k) 

(0, 1) 

-(O.i) 

(k,  0) 

(j.0) 

— (i.  0) 

-(1,0) 

-(O.k) 

— (0,j) 

(O.i) 

(0, 1) 

(0, 1) 

(O.i) 

(0,j) 

(O.k) 

-(0, 1) 

-(O.i) 

— (0.  j) 

-(O.k) 

(0,  i) 

-(1.0) 

—  (k,  0) 

(j.0) 

(O.i) 

-(0, 1) 

-(O.k) 

(0.  j) 

(0,j) 

(k,  0) 

-(1.0) 

—  (i,  0) 

(0,j) 

(O.k) 

-(0, 1) 

-(O.i) 

(0,k) 

—  (J.0) 

(i,  0) 

-(1,0) 

(0,  k) 

— (0,j) 

(O.i) 

-(0, 1) 

56.  Although  B  is  nicely  normed,  the  steps  of  (b)  in  Problem  55  are  not  justified 
for  it  because  we  cannot  conclude  that  B  is  alternative.  Since  the  argument  for 
(b)  breaks  down,  so  do  the  arguments  for  (c)  and  (d). 
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1.  The  only  integer  <  60  that  is  not  the  product  of  powers  of  at  most  two  primes  is 
30.  Thus  Burnside’s  Theorem  assures  us  that  the  only  possible  order  less  than  60  for 
a  nonabelian  simple  group  is  30.  The  integer  30  is  of  the  form  2 pq  with  p  —  3  and 
q  —  5,  and  q  +  1  —2 p.  Part  (b)  of  Problem  34  at  the  end  of  Chapter  IV  is  applicable 
and  shows  that  the  group  has  a  subgroup  of  index  2;  subgroups  of  index  2  are  always 
normal. 


2.  For  (a)  and  (b),  ( xyx  1  y  1 )  1  =  yxy  1x  1  is  a  commutator,  and  so  is 
a(xyx~1y~1)a~1  =  (axa~1)(aya~1)(axa~1)~1(aya~1)~1. 

3.  Let  H  be  generated  by  a  and  b,  and  let  K  be  generated  by  bah1  and  half. 
Certainly  K  c  H.  Since  bab2  and  bab 3  are  in  K,  so  is  (bab2)~l  (bab2)  =  b  and 
then  so  is  ( b~l)(bab2)(b~ 2)  =  a.  Hence  H  C  K. 

4.  If  H  is  characteristic,  then  in  particular  every  inner  automorphism  x  —*■  gxg~{ 
carries  H  to  itself,  and  H  is  normal.  If  tp  :  G  — »■  G  is  an  automorphism  and  z  is 
in  Zg,  then  the  equality  <p(z)<p(g)  —  <p(zg )  =  <p(gz)  =  <p(g)<p(z )  and  the  fact  that 
<p  is  onto  G  show  that  <p(z)  is  in  Zq.  If  i,//  :  G  — »■  G  is  an  automorphism,  then 
ir(xyx~ly~l)  —  if(x)\lr(y)(tlr(x))~1(xlf(y))~1  shows  that  fi  carries  commutators 
to  commutators;  hence  i fr  carries  the  generated  subgroup  G'  to  itself. 

5.  Zhs,  and  {1}  are  characteristic.  But  the  subgroups  of  order  4  are  not, 
because,  for  example,  there  exists  an  automorphism  of  H%  carrying  i  to  j. 


6.  Yes.  The  proof  of  Proposition  7.7,  which  takes  S  —  G,  gives  a  finite  presenta¬ 
tion. 


7.  In  (a). 


s/l  0 

o  -4 

■Ji 


GO 


s/2  0 

0  4 

s/2 


"G:r'-G:)- 


In 


(b).  we  have  also  ( ,”)(  _%)  '  ‘  =  (f /)  and 

00- t,“(m)-G0-“,(:^) 


(  s/2 

0 

U1 

f 

\o 

s/2, 

ij 

V 

are  in  G 

'  for 

a 

> 

is  in 

G' 

if  a 

> 

0. 

ci  -j- 

br 

>  0 

for  si 

4  o 

s/2 

0  s/2 


/  a  b\  /  1  0\/q  b/a  \ 

Vcrf/  —  \  c/a  1  /  V  0  a-1  /  v  0  1  /  ’ 


the  matrix 


Gi) 

then 


exhibits 


'  if  a  >  0.  If  a  <  0,  wehave  (“J)(;°)  =  J);  if  h  /  0, 

equahty  (“$)  =  (“ltd)  (-1°) 
ibits  ^  ^  as  in  G' .  Similarly  if  c  ^  0,  then  a  +  cr  >0  for  suitable  r  and  hence 


as  in  G' .  Thus  all  members  of  G  are  in 


G'  except  possibly  for 


in  G' .  This  follows  since 


with  a  <  0.  So  it  is  enough  to  prove  that  (  ^  ^  is 
has  been  shown  to  be  in  G'  and  has  square  equal  to 


(a  0  \  ■ 

U-)W1 


Chapter  VII 


665 


In  (c),  suppose  that  (xyx  1  )y  1  =  ^  Then  xyx  1  =  — y.  Taking  the 

trace  of  both  sides  and  using  the  fact  that  Trxv^-1  =  Tr  y,  we  see  that  Tr  y  =  —Try 
and  Tr  y  =  0.  Put  x  —  ( 't  * )  and  y  =  _*),  and  substitute  into  the  equality 

xy  =  —yx.  The  entry-by-entry  equations  are  ra  +  sc  —  — ra  —  tb,  rb  —  —ub, 
uc  =  — rc ,  and  tb  —  ua  —  —sc  +  ua.  The  first  and  fourth  equations  together  say 
that  2 ra  =  — tb  —  sc  =  —2 ua.  Thus  we  have  (r  +  u)a  =  0,  (r  +  u)b  =  0,  and 
(r  +  u)c  —  0.  Since  at  least  one  of  a,  b,  c  is  nonzero,  r  +  u  —  0  and  x  =  ^ 
Writing  out  the  equality  xy  =  —yx,  we  obtain  the  necessary  and  sufficient  condition 

2  ra  —  —sc  —  tb.  (*) 

The  determinant  conditions  are  — r2  —  st  =  1  and  —a2  —  be  —  1.  Multiplying 
(*)  by  sc  and  substituting  st  —  —1  —  r2  and  be  =  —1  —  a2,  we  obtain  2 rsac  — 
— s2c 2  —  (—1  —  r 2 ) ( —  1  —  a2)  and  then  0  =  —s2c2  —  2rsac  —  1  —  a2  —  r2  —  r2cr  — 
—  ( ra  +  sc)2  —  1  —  a2  —  r2,  contradiction.  Thus  ^  )  is  not  a  commutator. 

8.  By  Proposition  7.8  the  constructed  group  is  a  quotient  of  the  group  given  by 
generators  and  relations.  We  actually  have  an  isomorphism  if  each  element  of  the 
group  given  by  generators  and  relations  is  of  the  form  bpaq  with  0  <  p  <  2  and 
0  <  q  <  8  because  the  group  given  by  generators  and  relations  then  has  order 
<  27.  Right  multiplication  by  a  carries  this  set  to  itself.  Right  multiplication  by  b 
has  bpaqb  —  bpb(b~laqb )  =  bp+1(b~1ab)q  —  bp(a4)q  =  bpa4q,  and  this  equals 
a  suitable  element  bp  aq  with  0  <  p'  <  2  and  0  <  q'  <  8.  Hence  the  group 
defined  by  generators  and  relations  has  at  most  27  elements,  and  we  have  the  desired 
isomorphism. 

9.  Let  Fn  be  free  on  x\ ,  yi , . . . ,  xn ,  yn,  let  <p  :  Fn  — »•  Fn / F'n  be  the  homomorphism 
of  Corollary  7.5,  and  let  T'  :  Fn  — >•  G„  be  the  given  quotient  homomorphism.  Then 
ker  <p  C  ker  T ,  and  Proposition  4.11  shows  that  there  exists  a  group  homomorphism 
t lr  :  Gn  —*■  F„IF'n  such  that  \[r  o  >P  =  < p.  Since  Fn/F'n  is  abelian,  i Jr  factors  as 

o  q,  where  q  :  G„  —*■  Gn/G'n  is  the  quotient  and  x/r  :  Gn/G'n  —*■  Fn / F'n  is  a 
homomorphism.  Thus  xjr  o  q  o  *P  =  q>.  Since  <p  is  onto,  xfr  is  onto;  thus  the  image 
of  i ]/  is  isomorphic  with  Fn/F'n,  which  is  free  abelian  of  rank  2 n.  The  group  Gn/ G'n 
is  abelian  and  has  a  generating  set  of  2 n  generators,  thus  is  a  homomorphic  image 
£  :  An  —y  Gn/G'n,  where  An  is  free  abelian  with  2 n  generators.  The  composition 
x[r  of  is  a  homomorphism  from  a  free  abelian  group  of  rank  2  n  onto  a  free  abelian 
group  of  rank  2  n.  Taking  into  account  the  proof  of  Theorem  4.46,  we  see  that  i /f  of 
is  one-one.  Since  §  is  onto  Gn/G'n,  i jr  is  one-one.  Therefore  Gn/G'n  is  free  abelian 
of  rank  2 n . 

10.  Let  F  be  a  free  group  of  rank  n,  let  q  :  F  — >  F/F'  be  the  quotient  homo¬ 
morphism,  let  xi, . . . ,  Xk  with  k  <  n  be  generators  of  F,  let  F  —  f’({.ri ,  . . . ,  Xk}), 
and  let  <J>  :  F  -*  F  be  the  quotient  homomorphism.  The  composition  q  c  O 
is  a  homomorphism  of  F  onto  the  abelian  group  F/F' ,  and  it  factors  through  to  a 
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homomorphism  of  F/F'  onto  F/F' .  Here  the  domain  is  abelian  with  k  generators,  and 
the  image  is  free  abelian  with  n  generators,  and  there  can  be  no  such  homomorphism. 

11.  For  (a),  we  can  use  1  and  a.  For(b),theproofofTheorem7.10saysthatweare 
to  multiply  each  of  these  by  a ,  b,  c  on  the  right  and  take  the  H  part  of  the  result.  The  H 
parts  that  are  not  1  form  a  free  basis.  We  have  la  —  a  and  lap(a)~ 1  =  1,  lb  —  ba  ~ 1  a 
and  1  bp(b)~l  —  ba~l,  Ic  =  c  =  ca~la  and  lcp(c)-1  =  ca~l ,  aa  =  a2 1  and 
aap(a2)~x  =  a2,  ab  —  abl  and  abp(ab)~l  —  ab ,  and  ac  =  acl  and  acp(ac)~l  = 
ac.  Thus  a  free  basis  of  the  generated  subgroup  is  {ba~\  ca~l,  a2,  ab,  ac). 

12.  The  thing  to  prove,  by  induction  on  n,  is  that  if  a  ]  02  •  ■  ■  an  is  a  reduced  word 
in  variables  uq,  u\,  1/2, . . .  and  their  inverses,  and  if  we  then  substitute  xkyx~k  for 
Uk  and  reduce  in  terms  of  x,  y,  then  the  reduced  form  involves  a  total  of  n  factors  of 
y  or  y_1,  the  factor  to  the  left  of  the  first  y  or  y_1  is  xp  if  a\  =  u^1,  and  the  factor 
to  the  right  of  the  last  y  or  y_1  is  x~q  if  an  —  m^1  . 

13.  The  remarks  with  Proposition  7.15  show  that  the  reduced  words  in  C 2  *  C 2  are 
all  words  whose  terms  are  alternately  x  and  y.  Let  H  be  a  normal  subgroup  ^  {1}. 
Then  H  contains  a  conjugate  of  a  nontrivial  such  word.  Form  the  shortest  such  word 
/  1  in  H.  If  the  word  begins  and  ends  with  x  and  has  length  >  1,  we  can  conjugate 
by  x  and  reduce  the  length  by  2;  similarly  if  it  begins  and  ends  with  y  and  has  length 
>  1,  we  can  conjugate  by  y  and  reduce  the  length  by  2.  We  conclude  that  the  word 
has  length  1.  Then  H  contains  x  or  y  and  is  a  quotient  of  either  (y;  y2}  or  (x:  x2), 
which  give  C2  and  { 1 }. 

Thus  we  may  assume  that  a  shortest  nontrivial  reduced  word  in  H  is  a  product 
xy  ■  ■  ■  xy  with  2 n  factors  or  a  product  yx  ■  ■  ■  yx  with  2 n  factors.  Then  G/H  is  a 
quotient  of  (a,  b;  a2,  b2,  ( ab)n ),  and  we  saw  in  an  example  in  Section  2  that  this 
group  is  Dn.  We  readily  check  that  all  quotients  of  D„  are  of  the  form  {1},  C2, 
C2  x  C2,  and  Dm  for  certain  values  of  m  >  3. 

14.  Argument  #1:  When  the  irreducible  representations  are  all  1 -dimensional, 
Corollary  7.25  shows  that  the  number  of  irreducible  representations  must  be  |  G  \ ,  and 
Corollary  7.28  shows  that  the  number  of  conjugacy  classes  must  be  |G|.  Therefore 
each  conjugacy  class  contains  just  one  element,  and  G  is  abelian. 

Argument  #2:  Theorem  7.24  shows  that  the  irreducible  representations  separate 
points  in  G  in  the  sense  that  for  any  pair  x,  y  in  the  group,  there  is  some  irreducible 
R  with  R(x)  f-  R(y).  When  the  irreducible  representations  are  all  1 -dimensional, 
the  multiplicative  characters  separate  points.  Since  every  multiplicative  character  is 
trivial  on  the  commutator  subgroup,  the  commutator  subgroup  must  be  {1}.  Then 
every  pair  x,  y  has  xyx~1y~l  —  1  and  xy  —  yx. 

15.  This  is  immediate  from  Lemma  7.11. 

16.  For  (a),  every  cochain  /  has  the  property  that  mf  —  0.  Hence  the  same  thing 
is  true  of  cocycles  and  of  cohomology  elements. 
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For  (b),  the  cocycle  condition  for  /  says  that 


n—  1 

+  E  (-iy/Cgl.  ■  ■  ‘ ,  gi-l,  gigi+1,  gi+2,  ■■■,gn+ 1) 
i= 1 

+  (-D"/(gl,  •  •  •  »  gn-1,  gngn+l)- 


Summing  over  g„+i  in  G  gives 

(— l)"|G|/(gi, . . . ,  gn)  —  gi(F(g2,  ■  ■  ■ ,  gn)) 

n—  1 

+  E  (-l)'^(gl.  •••,gi-l.g/g/+l,gf+2,  •••,£«) 

(=1 

+  (-l)nF(gU...,gn-l). 

The  right  side  we  recognize  as  (5„_iF)(gi, . . . ,  g„),  which  is  the  value  of  a  cobound¬ 
ary  at  (gi, . . . ,  gn).  Therefore  \G\f  is  a  coboundary  and  becomes  the  0  element  in 
H2(G,  N ).  Thus  /,  when  regarded  as  an  element  of  H2(G,  N)  has  order  dividing 

Id- 

17.  The  two  parts  of  the  previous  problem  show  that  every  element  of  H2(G,  N ) 
is  of  finite  order  dividing  both  |G|  and  \G/N\.  Since  GCD(|G|,  \G/N\)  =  1,  every 
element  of  H2(G ,  N )  has  order  1.  Thus  H2(G,  N)  —  0,  and  the  only  extension  is 
the  semidirect  product. 

18.  The  only  automorphism  of  C 2  is  the  trivial  automorphism,  and  therefore  r  is 
trivial.  The  two  possibilities  for  G  are  C2  x  C 2  and  C 4.  With  G  =  C2  x  C2,  the 
group  E  can  be  C2  x  C2  x  C2  or  Hg,  and  with  G  =  C 4,  E  can  be  C2  x  C 4  or  Cg. 
For  the  cases  E  —  C2  x  C2  x  C2  and  E  —  C 2  x  C 4,  the  extension  is  the  direct 
product,  and  no  further  discussion  is  necessary.  For  the  cases  E  =  H%  and  E  =  Cg, 
the  embedding  of  N  =  C2  is  unique,  and  we  therefore  get  only  one  extension  in  each 
case.  Thus  there  are  exactly  two  inequivalent  extensions  for  each  choice  of  G. 

19.  If  N  embeds  as  a  summand  C2,  then  the  quotient  E / N  has  one  fewer  summand 
C2,  is  still  the  countable  direct  sum  of  copies  of  C2  and  C  4,  and  is  therefore  isomorphic 
to  E.  If  N  embeds  as  a  2-element  subgroup  of  a  summand  C 4,  then  the  quotient  E / N 
has  one  fewer  summand  C4  and  one  more  summand  C2,  is  still  the  countable  direct 
sum  of  copies  of  C2  and  C4,  and  is  therefore  isomorphic  to  E. 

The  action  r  has  to  be  trivial  because  C2  has  only  the  trivial  automorphism. 

If  an  equivalence  <t>  of  extensions  were  to  exist,  it  would  have  to  satisfy  <I>  i  \  (x )  = 
h{x)  for  the  nontrivial  element  x  of  N  —  C2.  But  / 1  (x )  is  an  element  of  order  2  that 
is  not  the  square  of  an  element  of  order  4,  while  h(x)  is  an  element  of  order  2  that 
is  the  square  of  an  element  of  order  4.  Since  <f>  is  an  isomorphism,  it  has  to  carry 
nonsquares  to  nonsquares,  and  we  cannot  have  <f>  i  1  (x )  —  h{x). 
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20.  Let  us  write  i\  and  i2  for  the  inclusions  of  N  into  E\  and  E2 ■  For  (i i(x),  1) 
to  be  in  Q,  i i(x)  must  be  1;  hence  x  must  be  1.  Thus  x  m-  (i\(x),  1  )Q  is  one-one. 
The  image  of  tp  is  the  same  as  the  image  of  tp\,  which  is  G.  Suppose  that  (e\ ,  e2) 
is  in  {Ei,  E2)  H  Q.  Then  (p\{e\)  =  (p2{e2)  and  {e\,  e2)  =  (i i(x),  h (jc)_1)  for  some 
x  e  N.  Then  <p{e\,  e2)  —  (p\{i\{x))  —  1,  and  <p  descends  to  the  quotient. 

If  (e\ ,  e2 )Q  is  in  the  kernel  of  the  descended  tp,  then  (e\ ,  e2)  is  in  the  kernel 
of  the  original  <p,  and  e\  is  in  the  kernel  of  <p\.  Therefore  e\  =  ii(x)  for  some 
x  e  N.  Since  (p2{e2)  =  <pi(ei),  e2  is  in  the  kernel  of  cp2  and  e2  —  hiy)  for  some 
y  e  N.  The  element  (f  i  ( v) ,  hiy)1)  is  in  Q,  and  we  therefore  have  (i\  (.v),  i2{y))Q  = 
{i\{x),i2{y)){i\{y),ii{y)~l)Q  =  (i\(xy),  1) Q.  Thus  (i](x),  i2(y))Q  is  exhibited 
as  in  the  image  of  the  embedded  copy  of  N . 

21.  Since  Q  is  normal,  we  have  (u,u)(v,v)Q  —  ( a(u,v)TTv,b(u,v)uv)Q  = 

(b(u,  v),  b{u,  v)~1)(a{u,  v)uv,  b(u,  v)uv)Q  —  ( b{u,  v)a{u,  v)uv,  luv)Q  = 

( b(u ,  v)a(u,  v),  1  )(uv,  uv)Q).  Thus  the  cocycle  for  {E\,  E2)Qis{b{u,  v)a{u,  n)}  = 
{a(u,  v)b(u,  u)}. 

22.  Let  <J>i  :  E\  — >•  E[  and  <$>2  '■  E2  E'2  be  isomorphisms  exhibiting  the 
equivalences  of  the  extensions.  Define  <t>(ei,e2)  =  (O(ei),  ’t‘{e2 ))Q',  and  check 
that  this  descends  to  the  required  isomorphism  <J>  :  (E 1,  E2)/Q  — >•  ( E El,)/ Q' . 

23.  fix)  =  E,sGf(0W)  =  Z-tsG/H  T,heHf(!+h)J(i)  =  EieG/H  F(i)kO) 
=  F(x). 

24.  Fourier  inversion  and  Problem  23  give  F{x)  —  \G/H\~X  F(x)x(x) 

—  \G/H\~l  Exeg^/(X)XW.  Pulling  back  /  to  the  member  /  of  G  with  x  \  H  =  1 
and  substituting  the  definition  of  F,  we  obtain  the  desired  result. 

25.  For  (a),  if  C  =0,  then  all  a  e  F"  have  (a,  0)  =  0,  and  hence  C1-  =  F”. 
For  (b),  the  repetition  code  has  C  =  {0,  (1, ....  1)}.  The  members  a  of  F"  with 
(a,  (1, . . . ,  1))  =  0  are  the  members  of  even  weight,  hence  the  members  of  the 
parity-check  code.  For  (c),  it  is  enough  to  check  that  (a,  c)  =  0  for  each  pair  of 
members  a,  c  of  a  basis  of  C,  and  this  one  can  do  by  hand. 

For  (d),  Proposition  6.3  shows  that  n  =  dimC  +  dim  C1.  Since  C  —  C 1 , 
dimC  =  n/2. 

For  (e),  every  member  c  of  C  is  in  C 1  and  must  in  particular  have  (c,  c)  =  0. 
Therefore  c  has  even  weight. 

For  (f),  let  c  and  d  be  in  C,  and  write  cd  for  the  entry-by-entry  product  (logical 
“and”).  Then  wt(c  +  d)  —  wt(c)  -I-  wt(c')  —  2wt(cc'),  and  hence  |wt(c  +  d)  — 
Jj-wt(c)  +  jWt(c')  —  wt(cc')-  Considering  this  equality  modulo  2  shows  that  it  is 
enough  to  prove  that  C  C  C1-  implies  that  wt(cc')  is  even  whenever  c  and  d  are  in 
C.  Modulo  2,  we  have  wt(cc')  =  (c,  d),  and  (c,  d)  —  0  since  C  C  C ±. 

26.  In  (a),  every  element  of  F"  has  order  at  most  2,  and  thus  /  takes  only  the 
values  ±1.  Define  (ax)i  to  be  0  if  /(e,)  =  +1  and  to  be  1  if  /(e,)  =  —  1.  Then 
X(ej)  =  ( —  1 ) f « , )  for  /  -phe  two  sides  extend  uniquely  as  homomorphisms  of 
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F"  to  {±1},  and  it  follows  that /(c)  =  (— for  all  c  e  F".  The  remainder  of  (a) 


is  routine. 


In  (b),  let  /  correspond  to  a. 

£,SF»/(c)(-n(a’c). 

In  (c),  we  have 

Then  f(a )  =  /(/)  =  EfSF"  /(0/(c) 

n /?(«;)  =  n  e  Max-  Dfl,Ci 

i  i  Ci  gF 

=  E  /i(ci)(-i)aiC1-- 

cisF 

•  E  fn(Cn){-  i)a”C" 
c„gF 

=  E  /i(ct)(-i)aiCl-- 

ceF" 

■  =  E  /(c)(— l)(a,e)  =  T(a) 

ce  F“ 

27.  ln(a),/0(0)  =  ECoSF  /o(c0)(-l)°Co  =  /o(0)(+l)  +  /0(1)(+1)  =  x  +  y  and 

fo(  1)  =  ECoSf/o(co)(-1)1c°  =  /o(0)(+l)  +  /o(l)(-l)  =  x-y. 

In  (b).  Problem  26c  gives 


/(«)  =  fl  fo(at)  =  {  FI  (x  +  )'))(  FI  (x  ~  y)) 

i= 1  i  with  cii  =0  i  with  a,-  =  1 

=  (x  +  v)"“wt(fl)(.r  -  V)wt(a). 

28.  In  (a),  the  members  of  G/H  lift  exactly  to  the  members  cu  of  G  with  co\H  =  1 . 
Under  the  mapping  of  Problem  26a,  any  member  /  of  G  yields  a  unique  member  ax 
of  F"  with  /(c)  =  (—  l)(<i*,c)  for  all  c  e  F".  If  ax  is  in  C1,  then  this  formula  gives 
/(c)  =  1,  i.e.,  x\H  =  1-  If  ox  is  not  in  CL,  then  /(co)  /  1  for  some  co  e  C,  i.e.. 

In  (b),  we  apply  the  special  case  of  Problem  24  mentioned  in  the  educational  note. 
Then  the  result  is  immediate,  in  view  of  (a). 

In  (c),  we  let  /(c)  =  xn~wt(c) -ywt(c) .  Problem  27b  says  that  /(a)  = 
(x  +  y)"~wt(a)(x  —  y)wt(£J/  Substituting  into  the  formula  of  the  previous  part  gives 
Ecsc  U'-wt(c)ywt(c)’=  IC^r1  EaeCx(x  +  y)”_wt(a)(^  -  y)wt(a),  and  this  says  that 
Wc(x,  y )  =  | C^\~lWcx{x  +  y,x-  y). 

In  (e),  parts  (d)  and  (e)  of  Problem  25  show  that  the  only  monomials  XkYl  in 
Wc  (X,  Y)  with  nonzero  coefficients  are  those  with  k  and  /  even.  Therefore  Wc  ( X ,  Y) 
is  invariant  under  the  transformations  X  —  X  and  F  i->  -f.  The  Mac  Williams 

identity  shows  that  Wc  (X,  Y),  apart  from  a  constant,  is  the  same  polynomial  in  X  +  Y 
and  X  —  Y.  Therefore  W(  (X,  Y)  is  invariant  also  under  (X  +  Y)  h*  —  (X  +  Y)  and 
under  (X  —  Y)  i— >  —(X  —  Y).  Thus  Wc(X,  Y)  is  invariant  under  the  group  of 
symmetries  of  a  regular  octagon  centered  at  0  with  one  of  its  sides  centered  at  (1,0). 
This  symmetry  group  is  D^. 

29.  The  characters  of  G  are  the  ones  with  /„(1)  =  for  0  <  n  <  m.  Such  a 
character  is  trivial  on  H  if  and  only  if  yn  (q )  =  1,  i.e.,  if  and  only  if  —  1;  this 
means  that  nq  is  a  multiple  of  m ,  hence  that  n  is  a  multiple  of  p. 
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The  element  1  of  H  is  the  element  q  of  G .  Thus  the  question  about  the  identification 
of  the  descended  characters  asks  the  value  of  x„  (1)  when  n  is  a  multiple  jp  of  p.  The 
value  is  x„(l)  =  Q  =  Spq  =  £/.  ^ 

If  we  have  computed  F  on  G/H  and  want  to  compute  F  from  the  definition  of 
Fourier  coefficients,  we  have  to  multiply  each  of  the  q  values  of  F  by  the  values  of 
each  of  the  q  characters  of  G/H  and  then  add.  The  number  of  multiplications  is  q2. 
The  actual  computation  of  F  from  /  involves  p  additions  for  each  of  the  q  values  of 
t,  hence  pq  additions. 

30.  f(dP+k)  =  Er=o  lf(i)UUP+k)i  =  The  variant  of 

/  for  the  number  k  is  then  i  i— >•  Handling  each  value  of  k  involves  m  =  pq 

steps  to  compute  the  variant  of  /  and  then  the  q2  +  pq  steps  of  Problem  29.  Thus  we 
have  q2  +  2 pq  steps  for  each  k,  which  we  regard  as  of  order  q2  +  pq.  This  means 
p(q 2  +  pq)  steps  when  all  k’s  are  counted,  hence  pq(p  +  q)  steps. 

32.  By  inspection,  (£„, ,  lvf)\r  =  (tq ,  t>2 )y  has  the  properties  of  an  inner  product. 
The  definition  is  set  up  so  that  the  linear  mapping  l v  i->  v  of  V'  into  V  preserves 
inner  products. 

33.  The  contragredient  has  (Rc(x)£v)(v')  =  £v(R(x~l)v')  —  (R(x~l) v',  v)y  — 
W,  R(x)v) v  =  Ir(. x)v(v').  Hence  Rc(x)iv  =  £r(x)v,  and  {Rc(x)£v,  Rc(x)£'v)v  = 
(R(x)v,  R(x)v')y  =  ( R(x)v' ,  R(x)v)v  =  (v',  v)v  =  (v,  v')y  =  (£v,  £v’)v- 

34.  If  {u/}  is  an  orthonormal  basis  of  V,  then  {tVj  )  is  an  orthonormal  basis  of 
V’  by  Problem  32,  and  (Rc(x)lVj,lVj)v>  =  (lR(x)vr  £Vj)v'  —  (Vj,R(x)Vj)v  — 
(R{x)vj,  Vj)v-  Summing  on  j  gives  the  desired  equality  of  group  characters. 

35 .  In  view  of  Problem  34  a  necessary  condition  on  a  1  -dimensional  representation 
for  it  to  be  equivalent  to  its  contragredient  is  that  it  be  real- valued.  Hence  the  two 
nontrivial  multiplicative  characters  of  C 3  are  not  equivalent  to  their  contragredients. 

36.  Following  the  notation  in  the  discussion  before  Theorem  7.23,  let  Pijix)  = 
( R(x)iij ,  M,-),  let  /  be  the  left-regular  representation,  and  let  lv(u)  —  ( u ,  v)v  be  as 
above.  Consider,  for  fixed  /o,  the  image  of  Rc(g)lUi  under  the  linear  extension  to  V'  of 
the  map  E'(lUk)(x)  =  (R(x)uja,  uk)v.  This  is  E'(£j2kckUk)(x)  =  E'(  T,k  Ck£Uk)(x) 

=  J2kCkE'(£Uk)(x)  —  J2kdk(R(x)uj0,uk)y  —  (R(x)uJo,J2kckuk)v,  and  hence 
E'(£v)(x)  =  ( R(x)uj0 ,  v)v.  Then  the  image  of  interest  is 

E'(Rc(g)lUi) (x)  =  E\lR(g)Ui){x)  =  (. R(x)uj0 ,  R{g)ui)v 
=  (R(g~lx)Uj0,  Ui)V  =  ( l(g)Pij0)(X ). 

Therefore  /  carries  a  column  of  matrix  coefficients  to  itself  and  is  equivalent  on  such 
a  column  to  Rc. 

37.  Let  x  —  ^  ^  j  and  y  —  and  let  T  be  the  subgroup  generated 

by  x  and  y.  Observe  that  —I  =  x2,  y_1  =  ^  j  *  V  and  yx  =  f  j  ^  are  in  T. 
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Arguing  by  contradiction,  suppose  that  T  /  SL(2,  Z).  Choose  a  matrix  z  — 


in  SL(2,  Z)  but  not  V  such  that  max(|a|,  \b\)  is  as  small  as  possible.  If  ab  —  0, 
then  one  of  |o|  and  \b\  is  1  and  the  other  is  0  because  the  matrix  has  determinant 
1.  If  | a |  =0,  then  zy-1  has  top  row  (±1  0);  so  in  either  event  we  see  that  some 


member  of  SL(2,  Z)  outside  F  is  of  the  form  ± 


Since  x2  —  —  I  is  in  F  and 


yx 


is  in  T,  this  is  a  contradiction. 


Thus  the  matrix  z  cannot  have  ab  —  0.  Suppose  that  ab  >  0.  Then  zy  has 
top  row  (  —  b  a  —  b),  and  zy-1  has  top  row  (  —a  +  b  —a  ).  The  minimality  of 
max(|fl|,  \b\)  for  z  says  that 


max(|fl|,  \b\)  <  max(|  —b |,  \a  —  b\)  and  max(|fl|,  \b\)  <  max(|  —a  +  b |,  |  —  a\). 


Now  \a  —  b |  <  max(|fl|,  |h|)  since  ab  >  0,  and  the  only  way  that  we  can  have  the 
above  inequalities  is  if  a  =  b.  In  this  case,  zy  is  a  member  of  SL(2,  Z)  outside  F 
whose  top-row  entries  have  product  0,  and  we  have  seen  that  this  is  a  contradiction. 

Thus  we  must  have  ab  <  0.  Then  zx  has  top  row  ( b  —a  ).  The  product  of  these 
entries  is  positive  and  the  maximum  of  their  absolute  values  is  the  same  as  that  for  z. 
So  we  are  reduced  to  the  situation  in  the  previous  paragraph,  which  we  saw  leads  to 
a  contradiction.  We  conclude  that  F  =  SL(2,  Z). 

38.  In  PSL(2,  Z),  we  have  x2  —  y3  —  1,  and  Problem  37  shows  that  x  and 
y  generate  PSL(2,  Z).  Proposition  7.8  therefore  produces  a  homomorphism  car¬ 
rying  ( X ,  Y;  X2,  Y3)  onto  PSL(2,  Z).  Proposition  7.16  shows  that  Ci  *  C3  = 
{X,  Y\  X2,  F3},  and  the  composition  of  these  two  maps  yields  the  desired  homo¬ 
morphism  <J>. 


39.  Let  us  drop  the  “mod  ±  /”  in  order  to  simplify  the  notation.  In  (a),  yx  — 
(_i  J)  (1  “o)  =  (_!  1)  andy-'x  =  ("[  “‘)  (J  “‘)  =  (“J  _[)•  Thenzy.r  = 
(c-^  ,7)’  and  b'izyx)  —  max(|<7  —  b\,  \b\).  If  ab  <  0,  then  \a  —  b\  >  |a|  and  hence 
/x(zyx)  >  n(z).  Similarly  zy~lx  =  an(^  ^{zy~lx)  =  max(|a|,  | a  —  b\). 

If  ab  <  0,  then  \a  —  b\  >  \b\  and  hence  pt(zy  lx)  >  n(z).  The  arguments  with  v 
are  similar. 

In  (b),  we  have  zx  =  ^ 
max(|d|,  |c|)  =  v(z). 

In  (c),  the  entries  of  z  are  limited  to  ±  1  and  0.  We  may  take  the  first  nonzero  entry 
in  the  first  column  to  be  +1  by  adjusting  by  —  I  if  necessary.  Then  the  possibilities 

with  determinant  I  are  (JJ),  (J|),  (!!)•  (!7)-  (-I?)'  (-li)’ 

In  (d),  let  us  prove  by  induction  on  n  that  if  Z  =  a\  -  ■  ■  an  is  reduced  and  ends 
in  X?  then  <f>(Z)  =  (“  *)  has  ab  <  0.  The  base  cases  of  the  induction  are  n  =  1 


b  —a 
,  d  —c 


Then/x(zx)  =  max(|h|,  |a|)  =  /r(z)  and  v{zx)  = 


672 


Hints  for  Solutions  of  Problems 


and  n  =  2,  where  we  have  Z  =  X,  Z  =  YX,  and  Z  =  T-1X;  since  <t >(Z)  is 
(i  and  (  q  _  [  )  in  the  three  cases,  we  have  ab  <  0  for  each.  For  the 

inductive  step  we  pass  from  Z,  which  ends  in  X,  to  anything  obtained  by  adjoining 
factors  at  the  right  in  such  a  way  that  the  new  word  is  still  reduced  and  has  X  at 
the  right  end.  This  means  that  Z  is  replaced  by  ZYX  or  by  ZY~xX.  Suppose  that 


<J>(Z)  =  \  cd)'  are  assumtnS  that  ab  <  0.  According  to  the  calculation  in  the 
solution  of  (a),  the  entries  in  the  first  row  of  f>(ZY  X)  are  a  —  b  and  b,  with  product 
(i a  —  b)b  =  ab  —  b2  <  ab  <  0,  and  the  entries  in  the  first  row  of  <t>(ZF_1X)  are  —a 
and  a  —  b,  with  product  —a (a  —  b)  =  —a2  +  ab  <  ab  <  0.  Thus  the  induction  goes 
forward,  and  our  assertion  follows. 

Now  we  can  prove  by  induction  that 


/x ( <T> ( « i  •  ■  •  an))  >  /x(<f>(ai 


(*) 


if  Z  =  a\  ■  ■  ■  an  =  Z'an  is  reduced.  The  result  is  trivial  for  n  —  1 ,  and  we  let  n  >  2  be 
given  and  assume  the  inequality  for  words  of  length  <  n.  Let  a  word  of  length  n  >2 
be  given.  If  a„  —  X,  then  (*)  is  immediate  from  (b).  If  a„  f-  X ,  then  an-\  =  X  and 
an  is  Y  or  T-1 .  Also,  ZX  is  a  reduced  word.  From  the  previous  paragraph  we  know 
that  the  product  of  the  entries  in  the  first  row  of  is  <  0.  Applying  (b)  and 

then  (a),  we  obtain  /x (<T> (Z))  =  fi(<t>(ZX))  —  fj,(<P(Z'anX))  >  fi(<t>(Z')),  and  this 
proves  (*).  Similar  arguments  apply  to  v. 

For  (e),  we  are  to  prove  that  if  IT  is  a  nonempty  reduced  word,  then  <t>  ( IT )  is 
not  the  identity  of  PSL(2,  Z).  Assuming  the  contrary,  we  may  assume  without  loss 
of  generality  that  IT  is  as  short  as  possible  with  this  property.  If  FT  =  a\  ■  ■  -an, 
and  <1>(VF)  is  the  identity,  then  /x(<t>(VF))  =  /r(/)  =  1  and  similarly  v(d>(VT))  =  1. 
By  (d),  we  must  have  /x(d>(fli  ■  ■  ■  a^))  —  u(<t>(ai  ■  --a*))  =  1  for  1  <  k  <  n. 
Then,  for  each  k  with  1  <  k  <  n ,  T(ai  ■  ■  •  ai:)  lies  in  the  set  of  10  matrices  in 
(c)  but  is  not  the  identity.  The  10  matrices  in  (c)  are  obtained  by  applying  <t>  to 
the  elements  1,  XY,  Y~lX,  XY~\  XYX,  YX,  Y~\  X,  Y,  and  XY~lX.  The 
remaining  words  W  of  length  3  are  YXY,  YXY~l,  Y^1  XY,  Y~1XY~1,  and  the 
ones  of  length  4  are  XYXY,  XYXY~\ XY~lXY,  YXYX,  YXY~lX, 

Y~lXYX,  T_1XT_1X.  We  compute  <t>  directly  on  these  12  reduced  words  and 


obtain 


(.Li).  (-L!  Mr  iMLMTM-::)' 

*  ^ ^  _  j  i)-(o  i )  •  Consequently  O  ( W)  is  not  the  identity  for  W  of  positive 


length  <  4.  The  inequality  of  (d)  shows  that  /x  ( T  ( W ) )  >  2  if  W  has  length  >  4,  and 
therefore  T>(1T)  is  the  identity  only  if  W  is  the  empty  word. 


40.  The  definition  of  am  is  am  j  ^  ^ .  We  readily  check  that  am 

respects  multiplication  and  hence  is  a  homomorphism  into  some  group  of  matrices. 
Since  (a  +  mZ)(d  +  mZ)  —  (b  +  mZ)(c  +  mZ)  =  (ad  —  be)  +  mZ  =  1  +  7??Z,  the 
image  group  is  contained  in  SL(2,  Z//7jZ).  The  kernel  is  the  set  of  matrices  ( a  h,  ) 
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in  SL(2,  Z)  with  a  +  mZ  =  1  +  mZ,  b  +  mZ  —  0  +  mZ,  c  +  mZ  =  0  +  mZ, 
<7  +  mZ  =  1  +  mZ,  and  these  are  exactly  the  matrices  M  in  SL(2,  Z)  with  every  entry 
of  M  —  I  divisible  by  m.  Therefore  kero),,  =  Y(m).  This  proves  (a). 

In  (b), lety  =  GCD(a,  m),  sothatay-1  andmy-1  are  relatively  prime.  Applying 
Dirichlet’s  theorem  on  primes  in  arithmetic  progressions,  take  p  >  p  |  to  he  a  prime  of 
theformp  =  ay~l  +rmy~'  forsomer.  Thena+rw  =  pj/,andGCD(Q;+rm,  P)  = 
GCD(p)/,  P)  =  GCDfy,  P)  =  GCD(GCD(a,  m),  P)  =  GCD(a,  p,  m )  =  1. 

For  (c),  corresponding  to  any  member  of  SL(2,  Z/raZ)  is  a  matrix  j  with 
integer  entries  with  ad  —  be  =  1  mod  m.  If  p  is  a  prime  dividing  a  —  b  and 
c  —  d,  then  ad  —  be  =  bd  —  bd  =  0  mod  p,  and  hence  p  does  not  divide  m. 
Therefore  GCD(«  —  b,c  —  d,m)= 1.  Applying  (b),  we  obtain  an  integer  r  such  that 
GCDffl  +  rm  —  b.  c  —  d)  —  1 .  Let  us  then  work  instead  with  ^  a+l ^  j .  Adjusting 

notation  to  call  this  matrix  we  may  assume  that  GCD(a  —  b,c  —  d)  =  1. 

Since  m  divides  ad  —  be  —  1,  there  exist  integers  C  and  A  with 

(a  -  b)C  +  {d-  c)A  = 

Then  det  ^  c+mC  d+mC  )  is  equal  to 

(flcf  —  i>c)  +  (d  —  c)mA  +  (a  —  b)mC  —  {ad  —  fee)  +  —  1, 

and  ^  a  member  of  SL(2,  Z)  whose  image  under  am  is  the  given 

matrix  in  SL(2,  Z/mZ). 

41.  For  the  remainder  of  the  problems  in  this  set,  it  will  be  convenient  to  regard 
the  isomorphism  C2  *  C3  =  { X ,  Y :  X2,  Y3)  of  Proposition  7.16  as  an  equality: 
c2  *  C3  =  {X,  Y;  X2,  Y3). 

In  (a),  <$>,„  is  well  defined  as  a  consequence  of  the  second  conclusion  of  Proposition 
7.8. 

In  (b),  it  is  immediate  from  Proposition  7.8  that  the  kernel  of  T>,„  is  the  smallest 
normal  subgroup  of  C2  *  C3  containing  the  element  {XY)m .  Under  the  isomorphism 
<J>  :  C2  *  C3  — y  PSL(2,  Z),  we  have  3>((XT)"!)  =  (xy)m  modi/.  Since  the 
smallest  normal  subgroup  Hm  ofPSL(2,  Z)  containing  (xy)m  mod  i I  —  T>((XT)"!) 
is  $  of  the  smallest  normal  subgroup  of  C2  *  C3  containing  (XY)"\  we  have  Hm  = 
Ofker  <Y>m ). 

In  (c),  if  passage  to  the  quotient  is  denoted  by  qm ,  Proposition  4.11  shows  that  the 
point  needing  verification  is  that  the  scalar  matrices  in  SL(2,  Z)  lie  in  the  kernel  of 
qm  o  am ,  and  this  follows  since  ^  j  maps  under  am  to  the  matrix  with  entries 
taken  modulo  m  and  then  maps  to  the  identity  under  qm . 

In  (d),  Km  is  a  normal  subgroup  of  PSL(2,  Z),  and  it  is  thus  enough  to  show  that 
the  element  ( xy)m  mod  ±7  of  Hm  is  in  Km.  Since  =  (oj)  anc* 

since  the  mth  power  of  this  matrix  is  in  Y{m),  {xy)m  mod  ±7  is  indeed  in  Km. 
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For  (e),  part  (d)  shows  that  ^  j  mod  ±7  is  in  Km ,  and  its  rth  power  ^  *  tn‘  j  mod 
±7,  for  t  an  integer,  has  to  be  in  Km .  Then  ^  ^  —  x  ^  ^  x~ 1  mod  ±7  is  in 

Km  since  Km  is  normal,  and  so  are  (  1  +tm  ""  )  =  y  ~ 1  (  1  Un  )  v  and  (  1  tm  tm  )  = 
xv_1  ^  q  ""  ^  yx~K  for  the  same  reason. 

42.  Let  x  and  y  be  the  listed  images  in  the  stated  permutation  groups  of  X  and  Y. 
The  homomorphisms  in  this  problem  come  from  Proposition  7.8  since  in  each  case 
x2  —  I ,  v2  =  1,  and  (xy)m  can  be  verified  to  be  1.  What  needs  to  be  verified  in  each 
case  is  that  x  and  v  generate  the  stated  permutation  group. 

In  (a),  the  image  group  has  a  subgroup  of  order  2  and  a  subgroup  of  order  3  and 
hence  must  be  the  whole  6-element  63. 

In  (b),  Lemma  4.41  shows  that  (1  2  3)(1  2) (3  4)(1  2  3)-1  =  (2  3)(1  4),  and 
hence  the  image  group  has  a  subgroup  of  4  even  permutations  and  a  subgroup  of  3 
even  permutations,  therefore  must  be  all  of  21 4. 

In  (c),  we  have  (1  2) (2  3  4)  =  (1  2  3  4).  Thus  the  image  group  contains 
(1  2  3  4) 2  =  (1  3) (2  4),  (2  3  4)(1  3) (2  4) (2  3  4)“'  =  (1  4) (2  3),  and 
(2  3  4)(1  2) (2  3  4)_1  =  (1  3),  hence  a  subgroup  of  order  8  and  a  subgroup  of 
order  3.  Therefore  it  is  all  of  S4. 

In  (d),  we  have  (1  2)  (3  4)  (1  3  5)  =  (1  4  3  5  2).  Thus  the  image  group 
contains  a  subgroup  of  order  5,  a  subgroup  of  order  3,  and  a  subgroup  of  order  2,  all 
contained  in  2I5.  The  image  group  is  not  of  order  30  because  2ls  has  no  nontrivial 
normal  subgroups,  and  hence  it  must  be  all  of  2I5. 


43.  As  with  Problem  39,  let  us  drop  the  “  mod  ±7”inordertosimplify  the  notation. 
In  (a),  we  can  take  £1  =  (j°),  £2  =  £3  =  £4  =  (_5J), 


85 


-C 


-1  -1 

1  0 , 


For 


(b),  first  we  compute  the  six  values  of  gib\  as  g\b\  =  £2^1  = 

(-1J)’*3*1  =  (“!-!)* =  (”o-i)’gsfoi  =  (J-!)’*6*1  =  (-!_!)’ 

and  then  we  compute  the  six  values  of  £,T>2  as  g\b 2  =  ^  _j  _[  £2^2  —  ^  _[ 

*3*2  =  (“J“J).S4*2  =  (“J  l[).  85bl  =  (01  )-«6*2  =  (-iJ)-  Nextwe 
locate  each  of  these  products  in  a  coset,  writing  them  with  some  g ,  on  the  right. 
We  find  that,  up  to  mod  ±  7,  the  results  are  g\b\  =  £4,  £2^1  =  £5,  £3^1  =  £6, 


£4^1  =  £1,  £5^1  =  £2,  £6^1  =  £3,  £1^2  =  £3,  glb2  =  (_2  _j)  £6,  £3^2  =  £5, 
g4b2  =  (o  i)  £2>  £5^2  =  £i,  £6^2  =  (_2  i)  £4-  The  conclusion  is  that  generators 

of  K2  are  the  three  matrices  _i)’ (oi)’(_2i)' 

For  (c),  the  second  and  third  of  the  generators  in  (b)  are  in  H2  by  Problem  41e. 
The  equality  ^  ^  ^  ^  j  1  f  \  j )  exhibits  the  first  of  the  generators  as  in 
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Ho.  Hence  all  the  generators  are  in  Fh  and  K2  C  /H.  Therefore  Kj  =  Ho. 

For(d)withm  =  3,  wecan  take  the  12  coset  representatives  to  be  gi  =  ^  ty,g2  = 

88  =  (!1).S9  =  *10  =  (_;  ;).sn  =  (i“). *12  = 


Then 


we  compute  that  g\b\  = 


(-") 


U) 


=  g9,  gib\  = 


=  g4,  gibi  = 

,  g4bl  =  (“J  _i)  =  g 1,  g5bl  =  (ij  _°)  =  gll,  g6^1  =  ( 

■'•h'  =  GO  =  83.88*1  =  (:G)  =  (:J_0  sio.89*.  =  (-;_0  = 


0- 


gl2, 

82, 


g  6- 


gio^i  =  ( _j )  =  (_j  ?)  g%,gubi  =  ( . j ) = 8s,  gnbi  =  (_;_!) 

Also, gib2  =  (_!_))=  ge,g2b2  =  (_)  J)  =  (4  -2)  8w,g3b2  =  (44) 


=  gll.  g4^2  = 


(T!) 


=  83,  g5b2  = 


(-1-0 


=  g8,  86^2  — 


(ro 


^  =  (4)  =  (J0g2,g8i2  =  (-2-!) =  (-J-O*12’*9*2  =  ( J-0  = 


gl,  giofo  = 

(-10 


(-1-0  =  (l-0 


g7.  gllfe  = 


(-1  0 


=  84,  gnb2  — 


U-D 


I  g5- 


Thus  generators  of  K  \  are 


(:;_0.(J004-l)0l00:lO).(1-0- 

(  4  j )  •  All  but  (  _(j  4  j  and  (  3  -5  )  are  certainly  in  ^3  ■  The  expressions  (44) 

=  ( '4  4  3 ) and  ( 4  _5 )  =  (4 ,4 )  ( J  _0 show  that  these  two  §enerat°rs  are 

in  A/3.  Therefore  Kj  =  Hi,. 


44.  Problem  41  produces  a  homomorphism  am  of  Gm  onto  PSL(2,  Z/mZ)  with 
kernel  isomorphic  to  Km/Hm.  The  given  fact  Hm  =  Km  for  2  <  m  <  5  implies 
that  er,„  is  an  isomorphism  for  these  values  of  m.  This  proves  the  first  isomorphism 
in  each  part.  Problem  42  gives  us  homomorphisms  of  Gm  for  these  m’ s  onto  the 
third  group  listed  in  each  part.  Composition  with  a~ 1  then  gives  a  homomorphism  of 
PSL(2,  Z/f7jZ)  onto  the  third  group.  In  each  case  the  statement  of  Problem  43  gives 
the  number  of  elements  in  PSL(2,  Z/mZ),  and  this  matches  the  number  of  elements 
in  the  third  group.  It  follows  that  these  homomorphisms  are  isomorphisms. 

45.  For  (a),  linearity  gives  RgT(a:b)Rgl  (x,  y)  —  Re{Rbl{x,y)  +  ( a,b ))  = 
ReRgX{x,  y)  +  Re  (a,  b)  =  (x,  y)  +  Re(a ,  b)  =  TRll(atb)  (x,  y). 

For  (b),  the  result  of  (a)  says  that  we  get  a  semidirect  product.  Let  us  show  that 
the  two  sets— the  elements  of  the  semidirect  product  and  the  union  of  the  translations 
and  rotations— coincide.  In  one  direction  a  rotation  about  (.ro,  Jo)  is  of  the  form 
(x,  y)  Re(x-x0,  y-yo)  +  (x0,  yo)  =  Re(x,  y)  +  (a ,  b)  =  T{atb)Re(x,  y),  where 
(1 a.b )  —  —Rq(x 0,  yo)  +  C*o>  yo).  Hence  it  is  in  the  semidirect  product.  In  the  reverse 
direction  suppose  that  T(a  b)Ro  is  in  the  semidirect  product  and  is  not  a  translation. 
Then  0  is  not  a  multiple  of  27r,  and  we  can  put  (.ro,  yo)  =  (1  —  Re)~l(a,b).  Then  we 
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have  TMRe(x,  y )  =  Re(x,  y)+(a,  b)  =  Re(x- x0,  y-yo)+Re(xo ,  yo)+(a,  b)  = 
Re{x  -  x0,y  -  yo)  +  Re(  1  -  Re)~\a,b)  +  (a,b)  =  Re(x  -  x0,  y  -  yo)- 
(l-/^)(l-/^)_1(a,  b)  +  (a,  b)  +  (l-Ro)~l(a,  b)  =  Re( x-x0,y-y0)  +  (xo,  yo). 
Hence  T(a,b)Re  is  a  rotation  about  (xq,  yo). 

46.  In  (a),  we  need  to  show  only  that  rc  —  rari,.  In  (b),  we  need  to  show  that 
nra-Wan  is  a  translation  but  not  the  identity.  Then  it  follows  from  (b)  that  the 
group  G  generated  by  ra  and  r/,  is  infinite.  Since  (a)  and  Proposition  7.8  yield  a 
homomorphism  of  G(,  =  [X ,  Y ;  X2,  T3,  ( XY )6)  onto  the  infinite  group  G,  it  follows 
that  G(,  is  infinite.  Since  PSL(Z/6Z)  is  finite,  (c)  follows. 

To  establish  the  two  facts  that  need  checking,  we  may,  without  loss  of  generality, 
take  T  to  be  the  triangle  with  vertices  a  —  (0,  0),  b  —  (0,  —  1),  and  c  —  (s/3,  0).  The 
formulas  for  ra ,  r/,,  and  rc  are  ra( x,  y)  —  (— x,  —  y), 

ri,(x,  y)  =  (x  cos  —  (y  +  1)  sin  x  sin  ^  +  (y  +  1)  cos  —  1) 

=  (  -  jc/2  -  yV3/2  -  s/3/2,  xVl/2  -  y /2  -  1/2  -  l), 

and 

rc (x ,  y)  =  ((x  —  s/3)  cos  j  +  y  sin  j  +  s/3,  —  (x  —  s/3)  sin  |  +  y  cos  |) 

=  ((x-V3)/2  +  yV3/2  + V3,  -(x- V3)s/3/2  +  y/2). 

Then  rarb(x ,  y)  =  — r^(x,  y)  =  rc(x,  y)  by  inspection. 

To  verify  that  ri,rari,rari,  is  a  translation,  we  write  ri,rari,rari, (x,  y)  =  ri,r2(x,  y). 
The  formula  above  for  rc  gives 

r2(x,  y)  =  ((x—s/3)  cos  ^  +  y  sin  =f-  +  s/3,  —(x—s/3)  sin  +  y  cos  =y) 

=  (  -  (x- s/3)/2  +  ys/3/2  +  s/3,  ~(x-s/3)s/3/2  -  y/2). 

Then  the  first  coordinate  of  ryr/(x,y )  is  —  j(—  (x  —  s[3)/2  +  ys/ 3/2  +  V3)+ 
{{x-s/3)s/3/2  +  y/2)s/3/2  —  s/3/2  =  x  —  2s/3,  while  the  second  coordinate 
is  (-(x-s/3)/2  +  ys/3/2  +  s/3)s/3/2  +  ((jc— s/3)s/3/2  +  y/2)/2  -  3/2  =  y.  So 
ri,r2(x,  y)  =  (x  —  2v^3,  y)  is  a  translation. 

47.  We  may  suppose  that  the  representations  are  unitary.  Let  and  {i>2 j)  be 
orthonormal  bases  of  V\  and  V/-  Then 

(XRl  *  X*2)W  =  Exfil(^"')Ks2W 
J 

=  E  (^i(xy_1)ui,i,  vu)(R2(y)v2,j,  u2j) 
y,ij 

=  E  (^lW(^l(y_1)Vl.i,  Ul,,-)(/?2(y)V2,y,  lt2,y) 

y,i,j,k 

=  E  (7?i(x)ui,ft,  Ui,/)E  (^i(y)tti,*,  Vi,i)(/?2(y)i;2,y,  v2,;) 
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For  (a),  the  inside  sum  is  0,  and  the  argument  is  complete.  For  (b),  let  R\  =  AS  and 
V2j  =  vij.  Then  the  right  side  of  the  display  continues  as 

=  E  vu^Gld^ivij,  vhk)(vhj,  vu) 

i,j,k 

-  \G\d~ I  E  (RiWvi.k,  vi,i)8jk8ji 

i,j,k 

=  \G\dll  E(«iW"ih  vu)  =  \G\d^xRlM- 


48.  We  have  Ea Ep  =  |G|  2dadpR(xa)R(Xp)  =  |G|  2dadpR(xa  *  Xp)-  Prob¬ 
lem  47a  shows  that  this  is  0  if  Ra  and  Rp  are  inequivalent;  this  proves  (b).  Problem  47b 
shows  that  the  computation  with  Ra  =  Rp  continues  as  =  |  G  \  ~ 1  da R (/^)  =  Ea :  this 
proves  (a). 

49.  Let  S  be  the  set  of  all  finite-dimensional  irreducible  invariant  subspaces  Vv  of 
V.  Call  a  subset  T  of  S  “independent”  if  the  sum  Ersf  P*  's  direct.  This  condition 
means  that  for  every  finite  subset  {t\ , . . . ,  t„}  of  T  and  every  set  of  elements  i>,  e  Vt[ , 
the  equation 

t>i  +  •••-(-  vn  =  0 

implies  that  each  V/  is  0.  From  this  formulation  it  follows  that  the  union  of  any 
increasing  chain  of  independent  subsets  of  S  is  itself  independent.  By  Zorn’s  Lemma 
there  is  a  maximal  independent  subset  To  of  S.  By  definition  the  sum  Vo  =  Ersr0  ^ 
is  direct.  Consequently  the  problem  is  to  show  that  Vo  is  all  of  V.  Since  every 
member  of  V  lies  in  a  finite  direct  sum  of  finite-dimensional  irreducible  invariant 
subspaces  of  V,  it  suffices  to  show  that  each  Vs  is  contained  in  Vo.  If  .s'  is  in  To, 
this  conclusion  is  obvious.  Thus  suppose  s  is  not  in  To.  By  the  maximality  of  To, 
To  U  {s }  is  not  independent.  Consequently  the  sum  Vo  +  Vs  is  not  direct,  and  it  follows 
that  Vo  fl  Vs  7^  0.  But  this  intersection  is  an  invariant  subspace  of  Vs.  Since  Vs  is 
irreducible,  a  nonzero  invariant  subspace  must  be  all  of  Vs.  Thus  Vs  is  contained  in 
Vo,  as  we  wished  to  show. 

50.  Let  us  impose  an  inner  product  on  Vo  that  makes  R  |  ^  unitary.  Let  {iq, . . . ,  vn] 
be  an  orthonormal  basis  of  Vo.  If  we  write  R(x)Vj  =  E/=i  Rij(x)vi>  then  Rjj(x)  = 
(R(x)vj,  Vi).  Consequently  the  character  xa  °f  R\  Vq  is  given  by  xa(x)  =  E/  ^a(x)- 
Then  we  have 

EaVj  =  \G\~lda  E  Xa( x)R(x)vj  =  | G\~lda  E  E  Rkk{x)Rij{x)vi  =  vj, 

xeG  xeG  i,k 

and  Ea  is  the  identity  on  Vo. 

5 1 .  Problem  49  allows  us  to  write  V  as  the  direct  sum  of  possibly  infinitely  many 
finite-dimensional  irreducible  invariant  subspaces  V  =  (J)y  Vy.  If  any  v  in  V  is 
given,  we  can  write  v  —  Ey  vy  with  only  finitely  many  terms  nonzero.  Applying 
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Ea  and  using  Problem  50,  we  see  that  Ea  v  is  the  sum  of  those  vy  such  that  R\ is 

I  Vy 

equivalent  to  Ra.  Thus  each  nonzero  vy  has  the  property  that  Ea  vY  —  vy  for  some  a. 

On  the  other  hand,  this  equality  cannot  hold  for  two  distinct  a’s.  In  fact,  if  Ra 
and  Rp  are  inequivalent  and  we  have  EavY  —  vY  and  Epvy  =  vy,  then  application 
of  Ea  to  the  second  equality  gives  EaEpvy  —  Eavy  —  vy.  But  EaEp  =  0  by 
Problem  48b,  and  hence  vY  =  0. 

The  conclusion  is  that  for  each  nonzero  vY ,  there  is  one  and  only  one  Ea  such 
that  Eavy  ^  0,  and  that  a  has  Eavy  —  vY.  Applying  Ea  to  v  —  vy,  we 
obtain  E<*v  —  E«vy  =  Ly  vy  =  v.  Thus  J2a  =  7-  Problem  50 
shows  that  Ea  is  the  identity  on  any  finite  sum  of  vectors  lying  in  finite-dimensional 
irreducible  invariant  subspaces  equivalent  to  Ra.  The  direct-sum  decomposition  just 
proved  shows  that  Ea  is  0  on  any  vector  in  the  direct  sum  of  the  images  of  the  other 
Ep  s.  Thus  the  image  of  Ea  is  as  asserted. 

52.  For  a  as  given  and  for  any  v  in  V ,  we  have  Eav  =  |G|-1  ]PvsG  u>{x)R{x)v. 
The  members  of  the  image  of  Ea  are  exactly  the  vectors  v  for  which  Eav  =  v,  hence 
exactly  the  vectors  v  forwhich  |G|-1  a>(x)R(x) v  =  v.  Applying  R(y)  to  both 

sides  gives  R(y) v  =  |G|_1  E.vgG  u(x)R(yx)v  =  |G|_1  J2xeG  a>(y~lx)R(x)v  = 
coiy-^Gr*  J2x€G  co(x)R(x)v  =  tt>(y_1)u  =  co(y)v. 


Chapter  VIII 

1.  In  (a ),  tp  fixes  1  and  must  therefore  fix  the  subfield  generated  by  1;  this  is  Q. 
For  (b),  cp(a2)  —  tp(a)2.  For  (c),  if  a  <  b,  then  b  —  a  —  c2  for  some  c.  Flence 
(p(b)  —  tp(a)  =  tp(c)2,  and  <p(a)  <  tp(b).  For  (d),  let  r  be  any  real,  let  e  >0  be 
given,  and  choose  rationals  q\  and  q-±  with  q\  <  r  <  q2  and  q2  —  q\  <  e.  Then 
qi  —  tp(qi)  <  ( p(r)  <  cp(q2)  =  qi  by  (a)  and  (c).  Hence  | tp{r)  —  r\  <  e.  Since  e  is 
arbitrary,  tp(r)  —  r. 

2.  (1  +  r)”1  =  1  —  r  +  r2  —  r3  +  ■■■  ±  rn~l  if  r"  =  0. 

3.  This  follows  from  the  universal  mapping  property  of  the  field  of  fractions. 

4.  Suppose  that  X  divides  A(X)B(X),  i.e.,  A(X)B(X )  =  XC(X).  If  ao  and  bo 
are  the  constant  terms  of  A(X)  and  B(X),  we  then  have  aobo  =  0.  If  ao  —  0,  then  X 
divides  A(X);  if  bo  —  0,  then  X  divides  B(X ).  Hence  X  is  prime. 

5.  In  (a),  take  (X)  as  the  ideal.  It  is  prime  by  Problem  4.  Suppose  that  a  is  a 
member  of  R  with  no  inverse  in  R.  then  (X)  is  not  maximal  since  (a,  X)  strictly 
contains  it  and  does  not  contain  1.  For  (b),  we  can  use  (a,  X). 

6.  In  (a),  IXo  is  certainly  an  ideal.  Suppose  J  is  an  ideal  with  IXo  G  J ,  Choose  / 
in  J  that  is  not  in  Ixo-  The  function  x  —  xq  is  in  ^*0  •  Therefore  g  =  f 2  +  (x  —  x o)2 
is  in  J.  This  function  is  everywhere  >  0,  and  consequently  1  /g  is  in  R.  Hence 
1  =  (1  / g)g  is  in  J,  and  J  cannot  be  proper.  So  Ixo  is  maximal. 
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Part  (b)  uses  the  Heine-Borel  Theorem.  For  each  point  p  in  [0,  1],  choose  a 
function  fp  in  I  with  fp(p)  /  0.  By  continuity,  fp  is  nonvanishing  on  some  open 
set  Np  containing  p.  As  p  varies,  these  open  sets  Np  cover  [0,  1].  The  Heine- 
Borel  Theorem  produces  finitely  many  NPl , ,  NPk  that  cover  [0,  1].  Then  fPj  is 
nonvanishing  on  NPj  .  If  x  is  a  member  of  [0,  1],  then  x  is  in  some  Npj ,  and  fp  does 
not  vanish  at  x.  Thus  the  functions  fpi, ... ,  fpk  have  no  common  zero. 

For  (c),  suppose  that  the  maximal  ideal  I  is  not  some  V  Using  (b),  we  form  the 
function  g  =  fpi  +  •  ••  +  fPk-  This  is  in  /  and  is  everywhere  positive.  The  function 
1  / g  is  therefore  in  R,  and  1  =  (1  /g)g  is  in  I.  Hence  I  —  R,  in  contradiction  to  the 
fact  that  I  is  proper. 

7.  In  (a),  loo  is  an  ideal,  and  it  is  properly  contained  in  the  proper  ideal  of  all 
members  of  R  vanishing  at  —  oo.  Part  (b)  follows  from  Proposition  8.8.  The  reason 
for  (c)  is  that  for  each  .to  in  R,  there  is  a  member  of  R  that  is  nonzero  at  xo  and 
vanishes  at  infinity;  this  function  has  to  be  in  I,  and  thus  I  cannot  equal  IXa. 

8.  For  (a),  let  a  +  bi  be  a  nonzero  member  of  I .  Then  (a  +  b^/—5)(a  —  boj — 5)  = 
a 2  +  5 b2  is  a  positive  integer  in  I. 

For  (b),  I  is  an  additive  subgroup  of  Z  +  ZV— 5,  which  is  free  abelian  of  rank  2. 
Therefore  I  is  free  abelian  of  rank  1  or  2.  We  can  rule  out  rank  1  because  I  contains 
a  nonzero  integer  and  also  the  product  of  that  integer  and  V— 5. 

For  (c),  a  Z  basis  of  I  consists  of  x\  =  a\  +  b\  V— 5  and  X2  =  «2  +  bo \[— 5.  Put 
>’i  =  rx i  +  sx 2  =  (ra \  +  sa.2)  +  {rb\  +  sb2)V~5  and  >’2  =  tx\  +  11x2,  and  aim  to 

have  y  1 ,  y2  form  a  Z  basis  with  vi  not  involving  5.  We  thus  want  r b\  +  sb2  —  0, 

and  the  most  economical  way  of  achieving  this  equality  is  to  put  cl  —  GCD(iq,  bo  ) 
and  to  take  r  =  bod- 1  and  s  —  —bid-1 .  Then  GCD(r,  5)  =  1,  and  we  can  choose 
t  and  u  with  ru  —  st  —  1.  With  these  choices  we  have  ^ ^  =  {'t  u)  (  vl)-  Since 

det  {rfs  )  —  1,  this  change  is  invertible.  In  other  words,  yi  and  V2  form  a  Z  basis 
in  which  vi  is  some  nonzero  integer  n.  We  may  assume  that  n  >  0.  Let  m  be  the 
smallest  positive  integer  in  I .  Then  n  must  be  a  multiple  of  m  by  an  application  of 
the  division  algorithm.  Since  y\  and  V2  form  a  Z  basis  of  /,  we  see  that  n  equals  m. 

9.  It  is  straightforward  to  see  that  P  is  an  ideal  and  that  xy  e  P  implies  x  e  P  or 

y  e  P.  The  ideal  P  is  proper  since  the  presence  of  1  in  <p~ 1  ( P')  would  mean  that 
cp( I)  =  1  is  in  P' .  But  P'  is  proper,  and  thus  1  is  not  in  P' . 

10.  (a)  {(r,  0)  |  r  e  R}  and  {(0,  r)\r  e  R}. 

(b)  (X). 

(c)  (. X  -  1)  and  (X  -  2). 

(d)  (0). 

11.  For  (a),  Q[X]/7  is  a  field  and  hence  is  a  unique  factorization  domain.  For 
(b),  one  can  give  a  counterexample.  The  ring  Z[V~ 5]  is  an  integral  domain  and  is 
the  quotient  of  Z[X\  by  the  ideal  ( X 2  +  5);  therefore  I  —  (X2  +  5)  is  prime.  On  the 
other  hand,  Z[V— 5]  is  not  a  unique  factorization  domain. 
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12.  For  (a),  choose  x  and  y  with  xd  +  yc  —  1.  Dividing  by  n  gives  xc  1  +  yd  1  = 
n_1.  Then  (a)  follows  by  multiplying  through  by  m.  Part  (b)  uses  an  induction.  Group 
n  as  (p\]  ■  ■  ■  pkr'Z\ ) Pr'  and  apply  (a)  to  write  mn~x  —  a(p\'  ■  ■  ■  pkrZ[)~l  +  bpfkr. 
Repeat  the  process  with  a(p\'  ■  ■  ■  pk.’Z }  )-1 ,  and  continue. 

13.  For  (a),  proceed  as  in  the  argument  in  Section  4  until  near  the  end,  obtaining  x 
and  y  just  as  in  that  construction.  Then  8(x  +  y*J—2)  =  x2  +  2 y2  <  ^  +  2  •  ^ 

Then  we  have  S(r  +  s^/— 2)  <  <5(c  +  d*J— 2),  and  the  argument  goes  through. 

For  (b),  we  would  get  S(x  +  y^/—3)  —  x2  +  3y2  <  \  +  3  •  ^  =  1,  and  then  the 
step  8(r  +  iV— 3)  <  8(c  +  r/V— 3)  fails. 

14.  The  map  extends  to  an  R  module  homomorphism  by  the  universal  mapping 
property  of  RG,  and  it  is  one-one  onto  by  inspection.  To  check  that  it  respects 
multiplication,  it  is  enough  to  show  that  the  product  gig2  in  RG  maps  to  fgl  *  /j,, ,  i.e., 
that  fgl*fg2  =  fglg2.  The  computation  is  (fgl*fg2)(x)  =  EveC  fg^y~x)fg2{y)  = 
fgiixgf1),  and  this  is  1  if  and  only  if  xg^1  =  g  i,  i.e.,  if  and  only  if  x  —  gig2-  For 
other  values  of  x,  it  is  0.  Therefore  (  fgl  *  fgl)(x)  —  fglg^(x)  for  all  x. 

15.  Let  the  monic  polynomial  in  question  be  P(X).  We  prove  by  induction  on 
m  that  any  polynomial  A(X)  in  I  of  degree  m  is  a  multiple  of  P(X).  The  base  case 
of  the  induction  is  all  polynomials  of  degree  <  n  in  /;  only  0  fits  this  description. 
Assume  the  result  for  all  degrees  <  m,  and  let  A(X)  be  any  polynomial  in  /,  say 
with  leading  term  amXm ,  am  ^  0.  Then  amX"’~" P(X)  is  in  /,  and  so  is  B{X)  — 
A(X)  —  amX"'~" P(X).  The  coefficient  of  Xm  in  B(X)  is  0,  and  hence  B(X)  —  0 
or  else  degBfW)  <  m.  If  B(X)  —  0,  then  A(X)  —  amXm~n P(X),  and  A(X)  is  a 
multiple  of  P(X).  If  deg  B(X)  <  m,  then  induction  gives  B(X)  =  C (X)  P (X) ,  and 
therefore  A(X)  =  (amXm~n  +  C(X))P(X).  So  again  A(Z)  is  a  multiple  of  P(X). 

16.  Let  pi, ... ,  pn  be  n  distinct  positive  primes  in  Z,  put  qg  =  p\  ■  ■  ■  pg  for 
0  <  k  <  n ,  and  take  In+  \  —  (q„,  qn~\X ,  qn- 2X2,  . . . ,  qoXn).  This  can  be  written 
with  n  +  1  generators  but  not  with  fewer  than  that. 

17.  In  (a),  certainly  ker<p  3  (v2  —  x 3).  In  the  reverse  direction,  suppose  that 
Pn  (x)yn  is  in  ker  <p.  Since  y2  =  x2  mod  (y2  —  .r3),  we  can  reduce  this  element 

of  ker  ^  to  the  form  Qo(x)+Qi(x)y.  Substituting  with  t  gives  Qo(t2)+Qi(t2)t3  —  0. 
The  first  term  involves  only  even  powers  of  t,  and  the  second  term  involves  only  odd 
powers.  Thus  each  is  0  separately.  We  are  thus  to  determine  what  members  Qo(x) 
and  Q\(x)y  of  K[x,  y]  are  in  ker <p.  For  Qo(t2)  to  be  0,  every  coefficient  of  Q 0  must 
be  0.  For  Q\  (f2)/3  to  be  0,  every  coefficient  of  Q  \  must  be  0.  Therefore  only  0  is  of 
the  stated  form,  and  every  member  of  ker  p  lies  in  (y2  —  .r3). 

For  (b),  image  <p  contains  t2,  t3,  and  every  power  tn  such  that  n  —  2a  -f  3 b  with  a 
and  b  nonnegative  integers.  It  follows  that  image  <p  consists  of  all  linear  combinations 
of  powers  tn  for  n  >  2. 

19.  Write  A(X)  =  B(X)Q(X)  in  F[X],  and  let  A(X)  =  c(A)(c(A)-‘  A(X)), 
B(X )  =  c(B)(c(B)~l  B(X)),  and  Q(X)  =  c(Q)(c{Q)~x  Q(X))  be  the  decomposi- 
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tions  of  Proposition  8.19.  Then  we  have 

c(A)(c(A)-1A(Z))  =  c(B)c(Q)((c(B)-1fi(Z))(c(e)-1e(Z))). 

By  Gauss’s  Lemma  and  the  uniqueness  in  Proposition  8.19,  we  obtain  c(  A)~ 1  A(X)  — 
B(X))(c(Q)~l  Q( X )),  apart  from  unit  factors.  Therefore  the  member  Bq(X) 
—  c(B)~l  B(X)  of  B[X]  is  exhibited  as  dividing  Ao(X)  —  c(A)_1A(X)  with  a 
quotient  c(Q)~l  Q(X )  in  B[X], 

20.  Let  R  be  a  finite  integral  domain,  and  let  a  ^  0.  Multiplication  by  a  is  one-one 
since  R  is  an  integral  domain,  and  it  must  be  onto  R  by  the  finiteness.  Therefore  there 
is  some  b  with  ab  =  1,  and  we  have  produced  an  inverse  for  a. 

21.  Let  R'  —  R/(p).  Suppose  that  A(X)  =  B(X)C(X)  nontrivially  in  R[X]  with 

B(X)  —  bkXk  + - 1-  bo,  C {X)  —  b/X1  H - h  Co,  and  k  +  l  —  N.  Since  p  divides 

do  but  p2  does  not,  p  divides  exactly  one  of  bo  and  Co,  say  the  former.  In  R  \X ], 
we  have  A(X )  =  qnXn ,  C(X )  =  c/X/  +  •  •  ■  +  co,  and  A(X)  =  B(X)C{X).  Now 
X  is  prime  in  B'[A]  by  Problem  4,  and  XN  divides  B(X)C(X)  in  B'[A].  Using  the 
defining  property  of  a  prime,  one  power  at  a  time,  we  find  that  XN  divides  B( X). 
Since  deg  B  <  N,  we  must  have  B(X)  =  0  in  R  \X ].  Thus  p  divides  bk  in  R,  and  p 
divides  ap,  contradiction. 

22.  In  (a),  we  regard  WZ  —  XY  as  a  first-degree  polynomial  in  W,  with  Z  being  a 
prime  in  the  ring  of  coefficients.  A  nontrivial  factorization  of  WZ  —  XY  must  be  of 
the  form  A(X,  Y,  Z)(B(X,  Y,  Z)W  +  C(X,  Y,  Z))  with  Z  =  A(X,  Y,  Z)B(X,  Y,  Z). 
Since  Z  is  prime,  one  of  these  factors  must  be  a  unit,  hence  a  scalar.  If  A(X,  Y,  Z) 
is  a  scalar,  then  the  factorization  of  WZ  —  XY  is  trivial.  Otherwise  we  may  assume 
that  the  factorization  is  WZ  —  XY  —  Z(W  +  C(X,  Y,  Z)).  Then  Z  divides  XY ,  and 
we  arrive  at  a  contradiction  since  Z  does  not  appear  in  XY. 

In  (b),  we  expand  in  cofactors  about  the  top  row.  Using  induction,  we  see  that 
we  can  regard  the  determinant  det[Z,y]  as  a  first-degree  polynomial  in  Xu  with  an 
irreducible  coefficient  P(X 22,  X23, . . . ,  Xnn).  A  nontrivial  factorization  must  be  of 
the  form  det[A,;]  =  PX\  \  +  Q  =  A(BX\  \  +  C),  where  Q.  A.  B.  C  are  polynomials 
in  the  remaining  indeterminates.  Then  AB  =  P  and  P  irreducible  implies  that 
A  or  B  is  a  unit,  hence  a  scalar.  If  A  is  a  scalar,  our  factorization  of  det(  A;/- 1  is 
trivial.  Otherwise  we  may  assume  that  the  factorization  is  det[A(/]  =  P X \  \  +  Q  = 
P(Xn+C).  Then  P  must  divide  Q.  Taking  the  degrees  ofhomogeneity  into  account, 
we  see  that  Q  must  be  the  product  of  P  and  a  homogeneous  polynomial  of  degree  1 . 
Every  term  of  P  is  of  the  form  X2.a(2)  for  some  permutation  cr  of  {2, ... ,  «}, 
and  thus  such  a  factor  must  appear  in  every  term  of  Q.  However,  the  only  terms  of 
det[A',/  ]  that  contain  a  factor  f|”_2  X2.ap2)  also  contain  the  factor  Xu,  and  this  factor 
is  absent  in  Q.  Thus  the  assumed  reducibility  has  led  to  a  contradiction. 

23.  The  ideal  of  Z[X]  generated  by  A(X)  and  B(X)  consists  of  all  polynomials 
A(X)C(X)  +  B(X)D(X)  withC(X)  and  D(X )  inZ[A],  If  such  an  expression  equals 
some  integer  n,  then  a  GCD  within  Q[A]  of  A(X)  and  B(X)  divides  A(X)  and  B(X) 
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and  hence  must  divide  n.  It  is  therefore  of  degree  0  and  is  a  unit  in  Q[A'].  Thus  A(X) 
and  B(X)  are  relatively  prime  in  Q[  X], 

Conversely  if  A(X)  and  B(X)  are  members  of  Z[X\  that  are  relatively  prime  in 
Q[X],  we  can  find  P(X)  and  Q{X)  in  Q[X]  with  A(X)P{X)  +  B(X)Q(X)  —  1. 
Multiplying  by  a  common  denominator  of  the  coefficients  of  P{X)  and  Q( X),  we 
obtain  a  relation  A(X)C(X)  +  B(X )D(X)  —  n  with  all  polynomials  in  Z[X],  Thus 
n  is  in  the  ideal  of  Z[X\  generated  by  A(X)  and  B(X ). 


24.  We  are  given 


mr)  cms)— 


coefficient  matrix  C  = 


( 1+i  2-i  \ 

V  3  5/  /  ’ 


Left  multiplication  on  C  by  a  matrix  with  determinant  a  unit  does  not  change  the 


total  set  of  conditions  on 


and  right  multiplication  by  such  a  matrix  changes  the 
generators  but  not  the  module  they  generate.  In  the  first  column  of  C,  we  observe 
that  GCD(1  +  i.  3)  =  1  because  1  +  i  divides  2  and  GCD(2,  3)  =  1.  Then  we  have 

which  has 


-(!-*) 

-3  1+i 


'«)■ 


An  invertible  column 


—  (1  —  i)(l  +  i)  +  1  •  3  =  1,  and  we  are  led  to  the  matrix  A  =  ^ 

determinant  1.  We  can  thus  replace  C  by  AC  —  ^  _  i'i^/  ) ' 
operation  replaces  the  upper  right  entry  by  0.  Thus  we  are  led  to  the  diagonal  matrix 
(  o  - 1  ^ .  In  other  words,  we  may  assume  that  the  Z[i]  module  was  given  to  us 

with  generators  t\ ,  h  satisfying  t\  —  0  and  (—11  +  8 / ) T2  =  0.  Therefore  the  given 
Z[i]  module  is  cyclic  and  is  Z[i]  isomorphic  to  Z[i]/(— 11  +  Si). 


25.  In  (a),  8(z)  =  zz.  Then  S(zw)  —  zwzw  =  zzvjw  —  8{z)8(w). 

In  (b),  we  start  with  two  nonzero  members  a  and  ft  of  R.  We  are  to  find  y  and 
p  in  R  with  a  —  fty  +  p  and  S(p )  <  8(fi).  It  is  the  same  to  find  y  and  p  with 
a//3  —  y  +  p//3  and  S(p/ fi)  <  1.  Apply  the  hypothesis  with  z  —  a/P,  and  let  y  be 
the  element  r  such  that  S(z  —  r)  <  1.  Then  p  may  be  defined  as  /3(z  —  r ),  and  all  the 
conditions  are  satisfied. 

26.  Given  z  =  x  +  define  r  —  a  +  ^£>(1  +  -J —m)  in Z[|(l  +  m)]  by 

choosing  b  to  be  an  integer  with  \2y  —  b\  <  ^  and  then  choosing  a  to  be  an  integer 
with  | x  —  a  —  jb\  <  Since  | y  —  \b\  <  we  then  have 


Hz  ~  r )  =  (x  -a  -  \b)2  +  m(y  -  \b)2  <  \  +  m  ^  <  \  +  <  1. 


27.  In  (a),  complex  conjugation  is  an  automorphism  of  Z[i]  and  must  therefore 
carry  primes  to  primes. 

In  (b),  we  know  that  ( a  +  bi)(a  —  bi)  is  the  integer  N(a  +  bi).  Suppose  that 
N(a  +  bi)  —  mn  nontrivially  with  GCD (m.n)  —  1.  Since  a  +  bi  is  prime,  it 
divides  one  of  m  and  n.  Say  that  m  —  (a  +  bi)(c  +  di).  Then  m2  =  N(m )  = 
N(a  +  bi)N(c  +  di)  —  mnN(c  +  di).  Any  prime  number  dividing  n  must  divide 
the  left  side  m 2 ,  and  hence  there  can  be  no  such  prime.  We  conclude  that  N (a  +  bi) 
does  not  have  nontrivial  relatively  prime  divisors.  Hence  it  is  a  power  of  some  prime 
number  p. 


Chapter  VIII 


683 


In  (c),  let  N(a  +  bi )  =  pk .  The  left  side  is  the  product  of  two  primes  of  Z [/'].  If  p 
is  the  product  of  /  primes  of  Z [/],  then  pk  is  the  product  of  kl  primes.  Then  we  must 
have  kl  —  2,  and  k  must  divide  2. 

In  (d),  suppose  N(a  +  bi)  —  p2,  so  that  k  —  2  in  (c).  Then  /  =  1,  and  p  is  prime 
in  Z [/]. 

28.  The  equation  N(a  +  bi)  —  p  says  that  a 2  +  b2  —  p.  The  right  side  is 
=  3  mod  4,  but  3  is  not  the  sum  of  two  squares  modulo  4.  Hence  N (a  +  bi)  —  p 
is  impossible  when  p  =  3  mod  4.  Problem  27c  then  forces  N (a  +  bi)  =  p2,  and 
Problem  27d  says  that  p  is  prime  in  Z[i], 

29.  If  N(a  +  bi)  =  2.  then  \a\  —  \b\  —  1,  and  we  obtain  1  +  i  and  its  associates. 
If  N(a  +  bi)  =  4,  then  a  —  ±2  with  b  —  0  or  else  a  =  0  with  b  =  ±2;  in  these  cases 
a  +  bi  is  an  associate  of  2,  which  is  (1  +  ;)(1  —  i)  and  is  not  prime  in  Z[i], 

30.  The  multiplicative  group  of  Fp  is  cyclic  of  order  p  —  1.  If  /?  is  of  the  form 
4 n  +  1,  then  F*  has  order  4 /;.  The  nth  power  of  a  generator  then  has  to  be  an  integer 
whose  square  is  =  —  1  mod  p. 

31.  For  (a),  we  obtain  <p\  by  mapping  Z[X\  to  F;)[X]  with  a  substitution  homo¬ 
morphism  and  following  this  with  a  passage  to  the  quotient.  Similarly  (pi  is  obtained 
from  the  substitution  homomorphism  Z[X]  Z[i]  followed  by  the  passage  to  the 
quotient. 

For  (b),  the  kernel  of  q>\  consists  of  all  polynomials  that  are  multiples  of  X2+ 1  when 
their  coefficients  are  taken  modulo  p.  This  is  pZ[X]  +  (X2  +  1)Z[X ]  =  (p,  X2  +  1). 
The  kernel  of  (p2  consists  of  all  polynomials  with  the  property  that  when  taken  modulo 
X2  T  1 ,  they  are  multiples  of  p.  This  too  is  the  ideal  ( p.  X2  +  1). 

For  (c).  Problem  30  shows  that  the  polynomial  X2  + 1  factors  nontrivially  in  Fp  \  X  ]. 
Therefore  X2  +  1  is  not  prime,  the  ideal  (X2  +  1)  is  not  prime,  and  +  1) 

is  not  an  integral  domain.  By  (b),  Z [/]/(/?)  is  not  an  integral  domain,  and  the  ideal 
( p )  is  not  prime.  Hence  p  is  not  prime  in  Z [/'].  By  (c)  and  (d)  in  Problem  27,  p  is  of 
the  form  N(a  +  bi)  for  some  prime  a  +  bi  in  Z[i], 

For  (d),  if  we  have  p  —  N(a  +  bi)  —  N(a'  +  b'i),  we  obtain  two  prime 
factorizations  of  p  in  Z [/]  as  p  —  (a  +  bi)(a  —  bi)  —  (a'  +  b'i)(a!  —  b'i),  and 
unique  factorization  in  Z[i]  implies  that  a ’  +  b'i  is  an  associate  of  a  +  bi  or  a  —  bi. 

32.  For  (a),  multiply  C  on  the  left  by  the  matrix  A  that  is  the  identity  except  in  the 
first  column,  where  the  1th  entry  is  C,,-. 

For  (b)  and  (c),  the  step  of  row  reduction  leads  to  a  first  column  that  is  0  in  all 
entries  but  the  first,  where  it  is  GCD(Cn, . . . ,  Cnn).  In  other  words,  the  new  entry 
in  position  (1,1)  divides  all  entries  in  the  new  C.  Therefore  one  step  of  column 
reduction  leaves  the  entry  unchanged  in  position  (1,  1),  leaves  the  remainder  of  the 
first  column  equal  to  0,  and  makes  the  remainder  of  the  first  row  equal  to  0.  What 
is  left  in  the  rows  and  columns  other  than  the  first  is  a  matrix  whose  entries  are  all 
divisible  by  GCD(Ci  i ,  . . . ,  Cnn).  Hence  we  can  induct  on  the  size. 

33.  In  (a),  changing  notation  slightly  from  Lemma  8.26,  write  AE  =  DB  with 
det  A  and  det  B  in  Rx .  Over  the  field  of  fractions  of  R,  the  m-by-n  matrices  E  and  D 
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must  have  the  same  rank  since  A  and  B  are  invertible,  and  consequently  D  and  E  have 
the  same  number  of  nonzero  diagonal  entries.  Thus  for  some  /  with  0  <  /  <  k,  we 
are  given  that  Djj  divides  Dj+ \j+\  and  Ejj  divides  Ej+ij+i  whenever  1  <_/</. 
Fix  i  with  1  <  i  <  l,  and  consider  all  possible  i -by-/'  determinants  that  can  be  formed 
using  the  first  i  rows  of  B  and  one  of  the  (")  sets  of  i  columns.  Since  det  B  is  in  R  x ,  it 
follows  from  the  expansion-by-cofactors  formula  that  these  determinants  have  GCD 
equal  to  1.  Each  corresponding  determinant  for  DB  equals  D\\  ■  ■  ■  Da  times  such  a 
determinant,  and  hence  the  GCD  for  DB  is  D\\-  ■■  Da. 

Meanwhile,  the  GCD  of  the  determinants  for  A  is  also  1,  and,  because  of  the 
divisibility  property  of  the  diagonal  entries  of  E,  E\  \  ■  ■  ■  Ea  divides  each  of  the 
determinants  for  AE.  Hence  E\\  ■  ■  ■  Ea  divides  the  GCD  of  the  determinants  for 
AE,  which  equals  the  GCD  of  the  determinants  for  DB,  which  equals  D\  \  ■  ■  ■  Dn. 
Thus  E\  \  ■  ■  ■  Ea  divides  D\  \  ■  ■  ■  D,;. 

Arguing  similarly  with  the  determinants  formed  from  the  first  i  columns  of  A,  AE, 
B .  and  BD,  we  see  that  D\  \  ■  ■  ■  D,,  divides  E\\  ■  ■  ■  Therefore  D\  \  ■  ■  ■  Dtl  and 
E\ i  •  •  •  Eh  are  associates  for  1  <  i  <  l.  Since  none  of  the  factors  in  question  is  0,  we 
see  that  each  of  the  first  /  diagonal  entries  of  D  is  an  associate  of  the  corresponding 
diagonal  entry  of  E.  This  proves  the  desired  uniqueness. 

34.  For  (a),  we  have  seen  in  this  setting  that  the  decomposition  of  V  as  a  direct 
sum  of  cyclic  K[XJ  modules  means  a  decomposition  of  V  as  a  direct  sum  of  vector 
subspaces,  each  of  which  is  invariant  under  L.  Also,  if  Vo  is  one  of  these  vector 
subspaces,  the  cyclic  nature  of  the  module  means  that  there  is  some  vector  vo  in 
Vo  such  that  K[X]i>o  =  Vo,  and  the  diagonal  entry  of  the  matrix  D  in  the  proof 
of  Theorem  8.25  is  a  polynomial  M\X ]  such  that  Vo  =  K[X]/(M(X))  as  a  K[X] 
module.  Referring  to  Problems  26-31  of  Chapter  V,  we  see  that  i>o  is  a  cyclic  vector 
for  the  cyclic  subspace  Vo,  and  M[X ]  is  the  minimal  polynomial  of  L  on  this  subspace. 

The  divisibility  property  of  the  minimal  polynomials  and  also  the  uniqueness 
assertion  now  follow  from  what  has  been  proved  in  Problems  32-33.  We  know 
from  Problem  28  in  Chapter  V  that  the  data  of  a  cyclic  subspace  and  the  minimal 
polynomial  yield  a  particular  matrix  for  the  linear  mapping  and  hence  determine 
the  linear  mapping  on  that  subspace  up  to  similarity.  Consequently  the  uniqueness 
statement  that  has  just  been  observed  says  that  L  is  determined  up  to  similarity  by 
the  integer  r  and  the  sequence  of  minimal  polynomials. 

35.  Let  A  and  B  be  members  of  Mn( K).  Form  the  data  for  each  from  the  rational 
canonical  form  in  Problem  34.  Now  consider  everything  as  involving  vector  spaces 
over  the  larger  field  L.  We  are  given  that  the  two  matrices  are  similar  over  L,  i.e.,  are 
conjugate  via  GL(«,  L).  Problem  34  shows  that  the  respective  decompositions  have 
the  same  data.  The  two  matrices  still  have  the  same  data  when  we  again  consider  the 
field  to  be  K.  Hence  they  are  similar  over  K,  i.e.,  are  conjugate  via  GL(n,  K). 

36.  The  fact  that  the  homomorphisms  are  isomorphisms  follows  from  the  compo¬ 
sition  rule. 


Chapter  VIII 


685 


38.  In  (b),  we  can  write  any  member  of  F[Ai, . . . ,  Xn,  A]  as 

A„(XU-  ■  ■ ,  Xn)Xn  +  ...  +  Al(Xl,...,  Xn)X  +  A0(A1(  . . . ,  Xn), 

and  a**  acts  by  having  er*  act  on  each  coefficient.  Invariance  under  all  er**  ’s  therefore 
means  that  each  coefficient  is  invariant  under  all  er*’s  and  hence  is  a  symmetric 
polynomial. 

39.  In  (a),  if,  for  example  i  <  j  and  k <  kj,  then  the  monomial  aX\l  ■  ■  ■  X is 
increased  in  the  ordering  by  replacing  the  factors  Xkl  x\ ]J  by  X k-‘  Xk' . 

1  J  1  J 

For  (b),  we  need  only  take  the  largest  monomial  in  each  ,  raise  it  to  the  c;  power, 
and  multiply  the  results. 

For  (c),  let  the  largest  monomial  in  A  be  aX\'  ■  ■  ■  Xk," .  To  define  M,  choose  r  —  a 
and  define  Cj  —  kj  —  kj+\  for  1  <  j  <  n  and  c„  =  kn. 

For  (d),  the  construction  in  (c)  yields  0  coefficient  for  Xi  ■  ■  ■  X," ,  and  A  —  rM 
has  no  larger  monomials.  So  if  A  —  rM  =  0,  the  largest  monomial  is  below  that 
monomial  x\l  ■  ■  •  Xk" . 

For  (e),  iteration  of  the  construction  in  (c)  and  (d)  shows  that  any  homogeneous 
symmetric  polynomial  equals  a  homogeneous  polynomial  in  the  elementary  sym¬ 
metric  polynomials.  Problem  37  shows  that  any  symmetric  polynomial  is  a  linear 
combination  of  homogeneous  symmetric  polynomials,  and  hence  every  symmetric 
polynomial  is  a  polynomial  in  the  elementary  symmetric  polynomials. 

40.  Suppose  that  z o  and  wq  in  C"  have  P(zo)  ^  0  and  P(u>o)  ^  0.  As  a  function 
of  t  e  C,  P(zo  +  t(wo  —  Zo))  is  a  polynomial  function  nonvanishing  at  t  =  0  and 
t  —  1.  The  subset  of  t  e  C  where  it  vanishes  is  finite,  and  its  complement  in  C 
is  necessarily  pathwise  connected  and  therefore  connected.  Thus  zo  and  wq  lie  in  a 
connected  subset  of  C"!  where  P  is  nonvanishing.  Taking  the  union  of  these  connected 
sets  with  zo  fixed  and  w o  varying,  we  see  that  the  set  of  wq  €  C"'  where  P(wq)  ^  0 
is  connected. 

41.  For  (a),  two  applications  of  the  formula  relating  Pfaffians  and  determinants 
gives  us  Pfaff(A'XA)2  =  det(A'XA)  =  (detA)2detX  =  (det  A)2Pfaff(X)2.  Tak¬ 
ing  the  square  root  gives  the  desired  result. 

For  (b),  we  fix  X  with  Pfaff ( X )  ^  0  and  allow  A  to  vary.  On  the  set  where 
det  A  ^  0,  the  function  A  i->  Pfaff  ( A?  X  A)/  det  A  is  a  continuous  function  with  image 
in  the  two-point  set  {±Pfaff(X)J,  by  (a).  The  domain  of  the  function  is  connected  by 
Problem  40,  and  therefore  the  image  has  to  be  connected.  Hence  the  function  has  to 
be  constant.  Checking  the  value  of  the  function  at  A  =  /,  we  see  that  the  function 
has  to  be  constantly  equal  to  Pfaff  (A). 

42.  Form  the  ring  S  —  Z[{A/7},  {A/,}].  We  can  then  regard  Pfaff(A'AA)  and 
(det  A)Pfaff(A)  as  two  polynomials  with  entries  in  S.  If  we  fix  arbitrary  elements 
a ij  e  Z  for  all  i  and  j  and  also  Xjj  e  Z  for  i  <  j ,  then  Proposition  4.30  gives  us 
a  unique  substitution  homomorphism  T'  — »•  Z  such  that  'l'(l)  =  1,  'P(Ai;)  =  ai;-, 
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and  —  Xjj.  Assemble  the  and  x,j  into  matrices  a  —  [«l;]  and  x  =  [Xjj ] 

with  x  alternating.  Problem  41b  shows  that  the  identity  in  question  holds  when  the 
entries  are  in  C,  and  in  particular  it  holds  when  the  entries  are  in  Z.  Therefore 
Pfaff(fl'.ta)  =  (deta)Pfaff(x).  Since  Z  is  an  integral  domain  and  since  a  and  x  are 
arbitrary  with  x  alternating.  Corollary  4.32  allows  us  to  conclude  that  PfaftY  A'  X  A)  = 
(det  A)Pfaff( X)  as  an  equality  in  S. 

To  pass  from  S  to  K,  let  ljj  be  the  identity  of  K,  and  let  v3!  :  Z  — >  K  be  the 
unique  homomorphism  of  rings  such  that  ^i(l)  =  lie-  If  we  fix  arbitrary  elements 
ciij  of  K  for  all  i  and  j,  as  well  as  arbitrary  elements  Xij  of  K  for  i  <  j,  then 
Proposition  4.30  gives  us  a  unique  substitution  homomorphism  <t>  :  S  — »•  K  such 
that  Oil)  =  tp\{\)  =  Ik-  <t> ( A,y)  =  cijj  for  all  i  and  j,  and  <$>(Xjj)  —  Xjj  whenever 
i  <  j.  Applying  <t>  to  our  identity  in  .S',  we  obtain  PfafflV.va)  =  (deta)Pfaff(.r)  as 
an  equality  in  K. 

43.  From  Problem  42  and  the  hypothesis  on  #,  we  have  1  =  PfaftY  J )  = 
PfaftY#'  Jg)  =  (det  #)  PfaftY  J )  —  det#.  Hence  det#  =  1. 

45.  For  (a),  if  <p  :  R  — »•  R/Pk  is  the  quotient  homomorphism,  then  q>~ 1  of  any 
ideal  of  R/ Pk  is  an  ideal  I  of  R  containing  Pk .  If  Q  is  a  prime  ideal  dividing  /,  then 
Q  divides  Pk,  and  it  follows  that  Q  —  P.  Thus  the  only  possibilities  for  I  are  the 
powers  P'  of  P,  necessarily  stopping  with  i  —  k. 

For  (b),  we  know  that  7r'  lies  in  P'  but  not  P‘  1 .  For  1  <  i  <  k  —  1,  it  follows  that 
the  principal  ideal  (it'  +  Pk)/Pk  is  contained  in  the  ideal  P'  / Pk  but  not  in  P'+1/Pk. 
Since  the  ideals  P  '  /  Pk  for  j  <  k  are  nested  and  there  are  no  other  ideals  in  R/Pk, 
we  must  have  (tt1  +  Pk)/Pk  =  Pl  / Pk .  Thus  Pl / Pk  is  principal. 

46.  Corollary  8.63  and  Problem  44  together  show  that  every  ideal  of  R/I  is 
principal  if  it  can  be  shown  that  every  ideal  of  R/Pk  is  principal  when  P  is  a  nonzero 
prime  ideal.  The  two  parts  of  Problem  45  together  show  that  every  ideal  of  R/ Pk  is 
principal. 

47.  We  may  assume  that  (a)  C  /  since  otherwise  the  result  follows  with  b  =  0. 
Since  a  /  0,  the  ideal  I /(a)  in  R/(a )  is  a  principal  ideal  by  Problem  46c.  If  bo 
is  a  generator  of  this  ideal,  then  ( R/(a))bo  =  / /(a).  Since  bo  is  in  I /(a),  we  can 
write  it  as  bo  =  b  +  (a)  for  some  b  in  I.  Every  member  of  I /(a)  is  then  of  the  form 
(r  +  (a))(b  +  ( a ))  =  rb  +  ( a ),  and  we  conclude  that  every  member  of  I  is  of  the 
form  rb  +  sa  with  r  and  s  in  R. 

48.  Any  R  submodule  of  R  is  an  ideal. 

49.  Write  M  —  R. x\  +  ■  ■  ■  +  Rxn  with  x\, . . . ,  xn  in  F.  Each  x;  is  of  the  form 
rj sS 1  with  r,  and  s,  in  R  and  with  .?,•  ^  0.  Then  a  M  lies  in  R  for  a  -  n;=  :1  Si-  So 
aM  is  an  ideal  in  R,  by  Problem  48.  If  A  is  a  second  fractional  ideal,  choose  b  ^  0 
such  that  bN  is  an  ideal  in  R.  Then  ( aM)(bN )  is  an  ideal  in  R.  and  the  formula 
MN  —  iab)~l  (aM)(bN)  shows  that  MN  is  a  fractional  ideal. 

50.  Since  I  is  a  finitely  generated  R  module,  we  can  write  I  =  Ra\  +  •  •  •  +  Ra„ 
with  all  a,  in  R.  The  condition  for  x  e  F  to  be  in  7-1  is  that  x I  C  R.  and  it  is 
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necessary  and  sufficient  that  xa,  be  in  R  for  all  i.  Thus  it  is  necessary  that  x  be 
in  (a]  •  ■  ■  a,,)-1  R.  Consequently  7-1  is  an  R  submodule  of  the  singly  generated  R 
module  (ai  •  •  •  an)~x R.  Since  R  is  Noetherian,  7”  1  is  finitely  generated. 

51.  If  I  is  maximal  among  the  nonzero  ideals  of  R  for  which  there  is  no  fractional 
ideal  M  of  F  with  I M  =  R.  then  Lemma  8.58  shows  that  I  is  not  prime.  Choose 
a  nonzero  prime  ideal  P  with  I  C  p.  Then  Lemma  8.58  and  the  definitions  give 
I  C  IP -1  C  77-1  C  R.  We  cannot  have  7  =  I P~x  since  otherwise  IP  = 
(7 P~l)P  =  I(P~l  P)  —  I  and  Proposition  8.52  gives  /  =  0.  By  maximality  of  7, 
we  can  find  some  fractional  ideal  N  with  (7 P~l)N  —  R.  Then  7 (P~lN)  =  R ,  and 
we  can  take  M  =  P~1N,  by  Problem  49. 

52.  Every  member  x  of  M  has  xl  C  R ,  and  thus  M  c  7~* .  On  the  other  hand,  if 
x  is  in  7_1,  then  xl  C  R,  x  =  xIM  C  RM  =  M,  and  x  is  in  M. 

53.  If  M  is  a  fractional  ideal,  then  Problem  49  produces  c  ^  0  in  F  with  cM  C  R , 
and  Problem  48  shows  that  cM  is  an  ideal  of  R.  Using  Problem  52,  we  can  write 
M  —  (c)~*(c)M  =  (c)_1(cM).  This  proves  that  M  —  I  for  ideals  7  and  J .  Then 
(a)  follows  from  Theorem  8.55  and  Problem  52,  and  (b)  follows  from  Problem  52. 


Chapter  IX 

1.  The  equation  for  r  gives  r3  =  3r  —  4  and  r4  =  3r2  —  4r .  Therefore  the  inverse 
has  1  =  (r2  +  r  +  l)(ar2  +br  +  c)  =  ar4  +  (a  +  b)r3  +  (a  +  b  +  c)r2  +  (b  +  c)r  +c  — 
r2(4a  +  b  +  c)  +  r(—a  +  4b  +  c)  +  1(— 4 a  —  4b  +  c ),  and  we  are  led  to  the  system 
of  linear  equations 


4a  +  b  +  c  =  0, 

— a  +  4b  +  c  —  0, 

—4a  —  4  b  +  c  =  1. 

Then  (a,b,c)  =  (-^,  55),  and  (r2  +r  +  1)_1  =  -^r2  -  +  55. 

2.  Multiplication  by  a  nonzero  r  is  a  one-one  F  linear  mapping  from  the  F  vector 
space  R  onto  itself.  Since  d i m /,  R  <  00,  this  linear  mapping  must  be  onto.  The 
element  s  such  that  rs  —  1  is  a  multiplicative  inverse  of  r. 

3.  Let  z 0  be  a  nonreal  element  of  K.  Then  the  closure  of  the  Q  vector  space 
Q  +  Qzo  contains  R  +  Mzo  =  C. 

4.  If  y  =  F(x)/ G(x),  then  G(x)y  =  F(x).  Arranging  the  terms  as  powers  of  x 
with  coefficients  of  the  form  ay  +  b  with  a  and  b  in  Ik,  we  see  that  x  is  a  root  of  a 
polynomial  in  one  indeterminate  over  k(y).  Therefore  x  is  algebraic  over  k(y). 

5.  The  condition  is  that  N  be  the  square  of  an  integer.  For  any  other  N,  X2  —  N 
is  irreducible  over  Q,  and  [Q(V^V )  :  Q]  =  2.  Since  2  does  not  divide  3,  Q(^/~N  ) 
cannot  be  a  subfield  of  Q(  1/2 ). 
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6.  Xl->F+1. 

7.  No,  since  8  is  not  a  power  of  4.  See  Corollary  9.19. 

8.  Let  g  be  a  generator  of  the  cyclic  group  Kx ,  and  let  q  be  the  order  of  K.  Then 

g  .  g2  .  g3  .  .  .  gq-l  =  ^l+2+3+-+(?-l)  =  g\q(q- 1)_ 

If  q  is  even,  then  this  is  (gq~1)q^2  =  lq !2  =  1  =  —  1.  If  q  is  odd,  it  is  (g^q~^)q  = 

(-D«  =  -1. 

9.  Proof  1:  Let  F(X )  =  Xn  +  cn-\Xn~l  +  ■  ■  ■  +  Co  be  the  minimal  polynomial 
of  r.  We  are  given  that  n  is  odd.  Write  the  equation  F(r  )  —  0  as 

r{rn~x  +  cn-2rn~3  H - f  ci)  =  -c)?_ir"_1  -  cn-j,rn~ 3 - c0. 

Then  r  is  expressed  as  an  element  of  Ik(r2)  unless  r"-1  +  cn-2rn~ 3  +  ■  ■  ■  +  ci  =  0. 
But  this  expression  cannot  be  0  because  this  polynomial  has  degree  n  —  1  and  the 
minimal  polynomial  for  r  has  degree  n . 

Proof  2:  The  element  r  of  K  is  a  root  of  the  polynomial  X2  —  r2  in  k(r2)[X],  and 
hence  [k(r)  :  k(r2)]  <  2.  Since  [k(r)  :  k]  =  [k(r)  :  k(r2)]  [k(r2)  :  k]  with  the  left 
side  odd  by  assumption,  [k(r )  :  k(r2)]  has  to  be  odd.  Thus  it  is  1. 

10.  Let  dr  =  [k(r)  :  k]  and  ds  =  [k(i)  :  k].  Since  K  contains  k(r)  and  k(i),  we 
see  that  dr  and  ds  divide  [K  :  k].  Since  GCD(</,.,  ds)  =  1,  d,  ds  divides  [K  :  k].  The 
minimal  polynomial  M(X)  of  r  over  k  is  a  polynomial  over  k (s)  such  that  M(r )  =  0. 
Thus  the  minimal  polynomial  N(X)  of  r  over  k(s)  divides  M(X).  If  c  is  the  degree 
of  N(X),  we  then  have  c  <  dr.  Since  drds  divides  [K  :  k],  we  obtain 


drds  <  [K  :  k]  =  [k(r,  s)  :  k]  =  [k(^)(r)  :  k]  =  c[k(s)  :  k]  =  cds  <  drds. 


Equality  must  hold  throughout.  Equality  at  the  right  end  says  that  c  —  dr,  and  this 
proves  (a).  Equality  at  the  left  end  says  that  drds  =  [K  :  k],  and  this  proves  (b). 

11.  In  (a),  we  have  y  =  fi  +  ca  =  fi(l  +  cco).  Here  r  =  1  +  coo  lies  in  Q(V— 3 ), 
and  so  does  r3.  Therefore  r  3  is  a  root  of  a  quadratic  polynomial  Y2  +  pY  +  q.  Then 
y6  +  ay 3  +  b  —  r6/36  +  ar3 /33  +  b  —  4 r6  +  2 ar3  +  b  =  4 (r6  +  \ar3  +  ^ b ),  and 
the  right  side  is  0  if  a  and  b  are  chosen  such  that  p  —  and  q  —  ^h. 

In  (b),  y  =  p  +  a  =  0(1  +  a),  and  y3  =  ^3(^(l“+  V=3  ))3  =  2(— 1)  =  -2. 
Then  y  satisfies  y3  +  2  =  0,  and  this  is  irreducible  since  —2  is  not  a  cube  in  Q. 

In  (c),  the  field  Q(y)  contains  y3  =  P3(\(3  -  ))3  =  \{3  -  )3  = 

\{21  -  9V-3  +  3  -  3(— 3)  —  (-3)7^3)  =  Thus  QCv^ )  is  a  subfield  of 

Q(y),  and  2  divides  [Q(y)  :  Q],  Since  Q(V— 3 )  is  a  subfield,  /3  =  y  (1  —  af)~x  lies  in 
Q(y).  Thus  Q(\/2)  is  a  subfield  of  Q(y),  and  3  divides  [Q(y)  :  Q].  Consequently 
6  divides  [Q(y)  :  Q],  and  the  minimal  polynomial  of  y  has  degree  >  6.  By  (a),  it 
has  degree  exactly  6. 
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12.  Let  the  characteristic  be  p.  If  F(X)  has  F' (X )  —  0,  then  all  the  exponents 
of  X  appearing  in  F(X)  are  multiples  of  p.  Let  F(X )  =  anX'lp  +  an-\X{n~1)p  + 
•  •  ■  +  a\Xp  +  ciq.  Since  the  Frobenius  map  is  onto  in  the  case  of  a  finite  field,  we 
can  choose  members  c„,  . . . ,  Co  of  Ik  such  that  c„  =  an ,  cp_ ,  =  an-\, . . .  ,  Cg  —  at). 
Put  G(X )  =  c„Xn  +  cn-\Xn~l  +  ■  ■  ■  +  co-  Then  F(X)  =  G(X)P,  and  F (X)  is 
reducible. 

13.  In  (a),  if  F(X)  —  G(X)H(X)  is  reducible  and  r\  is  aroot  of  G(X),  then  a(ri) 
is  a  root  of  G(X)  for  any  a  e  Gal(K/k).  Consequently  the  orbit  of  r\ under  Gal(K/k) 
is  aproper  subset  of  the  set  of  roots  of  F(X).  Conversely  if  F(X)  is  irreducible  and  ry 
is  given,  then  the  uniqueness  of  simple  extensions  gives  us  a  Ik  isomorphism  of  k(ri) 
onto  Ik  (ry).  Theorem  9. 13'  shows  that  this  isomorphism  extends  to  a  k  automorphism 
of  K,  and  hence  Gal(K/k)  is  transitive  on  the  set  of  roots  of  F(X). 

In  (b),  the  transitivity  follows  from  (a)  and  the  irreducibility  of  <t>8  (X )  over  Q.  Let 
l-  _  e2 jti/8.  The  roots  of  ct>g(X)  =  X4  +  1  are  f ,  £3,  f 5,  £7 .  So  if  a  is  in  Gal(K/Q), 
then  a(£)  =  with  k  odd.  Thena2(£)  =  a (t;k)  —  cr(^)A  =  (t,k)k  —  t,k  -  Since 
the  square  of  any  odd  integer  is  congruent  to  1  modulo  8,  ct2(£)  =  f .  Thus  each  a 
has  a2  =  1,  and  Gal(K/Q)  cannot  contain  a  4-cycle. 

In  (c),  the  irreducibility  of  F(X)  implies  that  F(X)  is  the  minimal  polynomial  of 
r\.  Hence  [k(ri )  :  k]  =  n.  Since  k(ri)  C  K,  [k(ri)  :  k]  must  divide  [K  :  k],  and 
n  divides  [K  :  kj.  Therefore  n  divides  the  equal  integer  Gal(K/k).  If  n  is  prime, 
then  the  fact  that  n  divides  the  order  of  Gal(K/k)  implies  that  Gal(K/k)  contains  an 
element  of  order  n,  by  Sylow’s  Theorems.  The  only  elements  of  order  n  in  &„  are 
the  «-cycles,  and  hence  Gal(K/k)  contains  at  least  one  u-cycle. 

14.  In  (a),  we  have  L^+i  =  1Lk(  JFk+\  ),  and  hence  [Lt+i  :  L*]  equals  1  or  2.  By 
induction,  [L*  :  Q]  is  a  power  of  2,  and  the  power  is  at  most  the  number  of  steps  in 
the  induction,  namely  k. 

In  (b),  associate  to  each  subset  S  of  {1, . . . ,  k)  the  element  v$  =  n/eS  in 
L/,.  The  product  of  any  two  such  elements  is  an  integer  multiple  of  a  third  such 
element,  and  hence  the  elements  vs  span  L*  linearly  over  Q.  Since  there  are  2k 
such  elements,  they  form  a  vector-space  basis.  The  extension  L^/Q  is  separable, 
being  in  characteristic  0,  and  it  is  normal  as  the  splitting  field  of  (X2  -  aj). 
So  it  is  a  finite  Galois  extension.  Any  member  a  of  Gal(X*/Q)  must  permute  the 
roots  of  each  X2  —  aj  and  hence  must  send  fa]  to  ±  Jcl].  On  the  other  hand,  a  is 
determined  by  its  effect  on  each  Ja~j.  Since  Gal(L^/Q)  has  order  2k ,  there  exists  for 
each  subset  S  of  {1, . . . ,  k)  one  and  only  one  o  such  that  o{*JcTj  )  =  —Jaj  for  j  £  S 
and  cr(^aj)  =  +^/a]  for  j  £  S.  The  group  Gal(L*-/Q)  consists  exactly  of  these 
elements. 

In  (c),  let  ay  be  the  member  of  Gal(L^/Q)  with  —  —~Ja~i  for  i  =  j  and 

aj {y/ai)  —  +VG  for  i  #  J ■  Then  o>(vs)  =  -vs  if  j  is  in  S’,  and  oy( vs)  —  +vs  if  j 
is  not  in  S. 

Arguing  by  contradiction,  let  *Jcik+i  —  csvs  with  each  cs  in  Q.  If 
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Oj(Ja.k+ 1  )  =  y/dk+\,  then  we  have 

E  c'sv.s  -  -s/cik+i  =  ojQak+i )  =  E  cs<Tj(vs )  =  -  E  LsLs  +  E  Lsfis, 

all  5  alls  S  with  jeS  S  with  j$S 

and  it  follows  that  cs  =  0  whenever  j  is  in  S.  On  the  other  hand,  if  ajiJa^f \  )  = 
—^/dk+i,  then  we  have 

E  csvs  -  y/ak+ i  =  -Oj{Jak+\ )  —  -  E  csctjivs)  =  E  csvs-  E  csvs, 

alls  alls  S  with  jeS  Swithj^S 


and  it  follows  that  cs  —  0  whenever  j  is  not  in  .S'. 

Define  So  —  [j  \  dji^/dic+i )  =  —  ^/dk+\  }.  From  the  above  it  follows  that 
cs  —  0  whenever  some  member  of  So  is  not  in  S,  and  that  cs  =  0  whenever  some 
member  of  the  complement  of  So  is  in  S.  In  other  words,  cs  —  0  except  for  cs0  .  We 

conclude  that  Jak+\  =  cSovSo  =  cSoJ fT/^sb  aJ  and  hence  that  ak+]  ~  c*o  lljeSo  aj- 
This  contradicts  the  hypothesis  that  {a  i, ...  ,a„}  are  relatively  prime  and  square  free. 
Hence  Jajf+x  does  not  lie  in  L*.  This  proves  (c),  and  we  obtain  [L^+i  :  L*]  =  2.  By 
induction  we  see  that  [L  :  Q]  =  2".  This  proves  (d). 

15.  For  (a)  and  (b),  Lemma  9.45  shows  that  Xp  —  a  is  irreducible  over  Q.  Hence 
[Q(r)  :  Q]  =  p.  Let  f  be  a  primitive  p{h  root  of  1.  Then  [Q(f)  :  Q]  =  p  —  1 
is  relatively  prime  to  [Q(r)  :  Q]  =  p.  Problem  10a  shows  that  Qp(X)  is  irre¬ 
ducible  in  Q(r).  Since  f  and  r  generate  K,  Problem  10b  shows  that  [K  :  Q]  = 
[Q<>)  :  Q]  [Q(f )  :  Q]  =  pip  -  1). 

In  (c),  the  Galois  correspondence  between  intermediate  fields  and  subgroups  of 
G  —  Gal(K/Q)  associates  Q(f)  to  the  subgroup  ./V  =  Gal(K/Q(f  )),  and  it  associates 
Q(r)  to  the  subgroup  H  =  Gal(K/Q(r)).  Since  Q(f)/Q  is  a  normal  extension,  N 
is  a  normal  subgroup  of  G.  Any  member  of  H  Pi  N  fixes  r  and  £,  hence  fixes  all  of 
K;  thus  H  Pi  N  —  {1}.  The  order  of  N  is  [K  :  Q(f)]  =  p,  and  the  order  of  H  is 
[K  :  Q(r)]  —  p  —  1.  Therefore  |G|  =  |//||A|,  and  G  is  a  semidirect  product  with  N 
normal. 

Proposition  4.44  says  that  the  action  of  an  internal  semidirect  product  is  given  by 
Thin)  —  hnh~l.  Let  us  identify  r*.  Let  h  e  H  —  Gal(K/Q(r))  have  h(r)  —  r  and 
h(f)  =  fk,  and  let  n  in  N  —  Gal(K/Q(f ))  have  n(f)  —  f  and  n(r)  —  rt} .  Then 
hnh~l(r )  =  hn(r)  —  h(rfl)  =  rfkl ,  and  hnh~l(f)  —  lm(fk  )  =  h(fk  )  =  f.  So  if 
n  sends  r  to  rf1  and  h(f)  =  f  ,  then  hnh~l  is  the  member  of  N  sending  r  to  fkl. 

This  n  is  the  member  of  N  corresponding  to  /  £  F;, ,  and  this  h  is  the  member  of 
H  corresponding  to  k  e  F*.  We  have  just  shown  that  hnh~x  is  the  member  of  N 
corresponding  to  kl  e  Fp.  Hence  the  action  corresponds  to  multiplication  of  F*  on 
additive  Fp . 

16.  [K  :  k]  =  Gal(K/k),  and  Gal(K/k)  is  a  subgroup  of  6n.  Being  a  subgroup, 
its  order  divides  the  order  of  which  is  n !. 
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17.  In  (a),  the  most  general  element  of  K  is  of  the  form  x  +  yr  with  x  and  y  in  Ik, 
and  its  square  is  (x2  +  y2r2)  +  2 xyr.  This  is  in  k  if  and  only  if  xy  —  0,  i.e.,  if  and 
only  if  x  +  yr  is  in  k  or  in  rk.  In  other  words,  the  only  squares  in  K  that  lie  in  k  are 
the  obvious  ones. 

In  (b),  the  same  remarks  apply  unless  the  characteristic  is  2.  If  the  characteristic 
is  2,  then  ( x  +  yr)2  =  x2  +  y2r2  and  this  is  in  k  for  all  x  and  y.  Hence  every  element 
of  K  is  a  square. 

18.  The  finite  group  G  may  be  regarded  as  a  subgroup  of  the  symmetric  group 
&„  for  n  =  |G|.  It  was  shown  in  Example  3  of  Section  17  that  there  exists  a  finite 
Galois  extension  K  of  Q  with  Galois  group  &n .  Let  k  be  the  fixed  field  of  G  within 
&n.  Then  Gal(K/k)  =  G. 

19.  The  polynomial  in  question  in  fixed  by  every  element  of  the  Galois  group. 
Hence  its  coefficients  are  in  the  subfield  of  K  fixed  by  all  elements  of  Gal(K/k).  This 
is  k. 

20.  For  (a),  define  F(X)  =  Y["j=\  (X  —  Xj ) .  For  cp  in  H ,  we  have  = 

UU  (X  -  <p(xj))  =  UU  (X  -  xi]  =  F{X)-  Thus  is  in  Let  M{X) 

be  the  minimal  polynomial  of  x\  overK  .  Since  F(.vi)  =  0,  M(X)  divides  F(X).  On 
the  other  hand,  the  equalities  M^^X)  —  M(X)  andM(xi)  =  0  imply  that  M(Xj)  —  0 
for  each  j.  Thus  M(X)  has  degree  at  least  n,  and  we  conclude  that  F (X)  =  M( X ). 

In  (b),  n  is  the  number  of  elements  in  an  orbit  of  //  and  hence  divides  \H\. 

In  (c),  when  the  isotropy  subgroup  of  FI  at  x\  is  trivial,  n  —  \H\.  Therefore 
[Kw(xi)  :  Kff]  =  n  —  \H\  =  [K  :  Kw],  the  last  equality  following  from  Corollary 
9.37.  Since  C  K,  it  follows  that  K^Gi)  =  K. 

21.  For  (a),  let  tp(z)  —  with  ad  —  be  ^  0.  Then  we  have  a  substitution 

homomorphism  of  C[X]  into  C(z)  fixing  C  and  sending  X  into  z.  Since  the  range 
is  a  field,  this  factors  through  the  field  of  fractions  of  C[X]  to  give  a  field  mapping 
C(X)  — >•  C(z).  We  can  regard  the  result  as  a  map  of  C(z)  into  itself,  and  we  write  the 
map  of  C(z)  into  itself  as  The  formula  is  d^-if  r)  —rotpforr  =r(z)  inC(z). 
Then  —r  o  (if(p)~x  =  (r  o  tp~l)  o  i/r— 1  =  <P^(r  o  tp~x)  —  (<E>^ ( r )),  and 

hence  T,/,^  =  O,/,  o  ^(p.  From  this  it  follows  that  >f>  -  is  a  two-sided  inverse  of 
Hence  <f>¥,  is  an  automorphism. 

For  (b),  d>CT(w(z))  =  w(<t_1(z))  =  (-z)2  +  ( —z)~ 2  —  z2  +  z-2  =  w(z), 
and  d>T(w(z))  =  w(r_1(z))  =  (1/z)2  +  (1/z)-2  =  z2  +  z~2  =  w(z).  Since 
—  <J>^  <I>^  by  (a),  it  follows  that  every  element  of  H  fixes  w.  Since  each  is 
a  field  automorphism,  C (w)  lies  in  K”. 

For  (c),  we  know  from  (b)  that  C(w)  c  Y_H .  The  orbit  of  z  under  H  has  4  elements, 
and  Problem  20a  shows  that  the  minimal  polynomial  of  z  over  KH  has  degree  4  and 
is  equal  to 

F(Z)  =  (X-z)(Z+z)(W-z“1)(2f+z“1)  =  (X2—z2)(X2—z~2)  —  X4  —  w(z)X2+\. 
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The  polynomial  F(X)  is  irreducible  over  IK  ,  and  its  formula  shows  that  its  coef¬ 
ficients  are  in  the  smaller  field  C(w).  Hence  it  is  irreducible  over  C (w)  and  is  the 
minimal  polynomial  of  z  over  C (w). 

For  (d),  (c)  shows  that  [K^  (z)  :  C(w)]  =  4.  Problem  20c  shows  that  K  =  (z), 

and  hence  [K  :  C(w)]  =  4.  Since  [K  :  C(w)]  =  [K  :  Kw]  [Kw  :  C(w)]  and  since 
[K  :  Kh]  =  4  by  Corollary  9.37,  KH  =  C (w). 

22.  For  (a),  let  L  =  K (s/u)  and  K  =  Ik(^u).  The  minimal  polynomial  of  s/u 
over  K  is  X2  —  u,  and  this  must  divide  the  minimal  polynomial  of  s/u  over  Ik.  The 
degree  of  the  latter  polynomial  equals  [k(^/u  )  :  Ik],  which  must  divide  4.  Hence 
it  must  be  2  or  4.  If  it  is  2,  then  X2  —  it  lies  in  k[X],  and  u  is  in  k.  We  return 
to  this  case  in  a  moment.  Suppose  that  the  minimal  polynomial  of  ~Ju  over  k  has 
degree  4.  Let  us  write  u  =  r  +  Ss/v  for  some  r  and  in  k.  Then  is  a  root  of 
(X2  —  r  —  Ss/v)(X2  —  r  +  Ss/v)  =  (X2—  r)2  —  s2v  —  XA  —  2rX2  +  {r2  —  s2v),  which 
is  a  quartic  polynomial  in  k[X],  Since  the  minimal  polynomial  over  k  has  degree  4, 
this  is  the  minimal  polynomial  and  is  irreducible.  Thus  (a)  holds  with  r  =  s/u. 

The  remaining  case  is  that  u  is  in  k  but  / u  is  not  in  k.  Consider  ±  s/u  ±  Jv. 
None  of  these  is  in  k.  The  computation 

(X  -|-  s/u  -f  s/v)){X  -|-  \J 7/  —  ^/u)(X  —  \fu  '*/v)(X  —  \fu  —  \[F) 

=  ((X  +  JT,)2  -  u)((X  -  JTt)2  -  v) 

=  (X2  +  U-V  +  2  XVm)(X2  +  U-V-  2X~Ju) 

=  (X2  +  u  -  v)2  -  4m X2  =  X4  +  2m X2  -  2i>X2  +  (u  -  v)2  -  4mX2 
=  X4  —  2 (m  -f-  v)X~  4-  (u  —  =  X4  -f-  bX~  c 

shows  that  these  are  all  roots  of  a  quartic  polynomial  in  k[X]  of  the  correct  kind, 
and  the  question  concerns  its  irreducibility  over  k.  As  in  the  previous  paragraph, 
reducibility  implies  that  it  is  the  product  of  two  irreducible  quadratic  members  of 
k|X].  Then  the  product  of  two  of  the  first-order  factors  is  in  k[X],  and  the  sum 
of  those  two  roots  must  be  in  k.  The  six  possible  sums  of  pairs  of  roots  are  ±  fTt. 
±s/v,  and  0  twice.  Since  ^fu  and  Jv  are  not  in  k,  the  irreducible  quadratic  must 
be  X2  —  O  +  v )2  or  X2  —  (^/ It  —  v )2.  However,  the  fact  that  Ti  is  not  in 
K  =  k(^u)  implies  that  neither  of  s/u  ±  s/v  is  in  Ik-  Thus  the  quartic  polynomial  is 
indeed  irreducible.  This  completes  (a). 

In  (b),  we  have  4  =  [L  :  k]  =  [L  :  k(r)]  [k(r)  :  k]  =  4[L  :  k(r)].  Thus 
[L  :  k(r)]  =  1,  and  L  =  K(r). 

In  (c),  suppose  that  c  —  t2  for  the  given  F( X ).  Find  members  m  and  v  of  k  with 
—2 (m  +  v)  =  b  and  m  —  v  —  t.  Then  the  displayed  computation  in  (a)  shows  that 
±  s/ii  ±  v/n  are  the  roots  of  X4  —  2 (u  +  v)X2  +  (u  —  v)2  =  X4  +  bX2  +  c.  The  given 
root  r  must  be  one  of  these.  Say  that  r  —  s/u  +  s/v  without  loss  of  generality.  Since 
[k(r)  :  k]  =  4  and  [L  :  k]  =  4  and  k(r )  C  L,  we  have  L  =  k(r).  On  the  other  hand, 
k(r)  c  k {s/u,  s/v),  and  [kf^/t/,  s/v)  :  k]  =  [k(^M,  s/v)  :  kf^/w)]  [k(s/u)  ■  k]  < 
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2-2  =  4.  Hence  k(^/H,  *Jv)  —  Ik(r)  =  L.  Then  all  four  roots  ±^/w  ±  ^/v  of  F(X) 
lie  in  L,  L  is  the  splitting  field  of  F(X)  over  Ik,  and  L/k  is  normal.  The  Galois  group 
is  generated  by  one  element  that  sends  *Ju  to  —  *Ju  and  fixes  *Jv,  and  by  a  second 
element  that  fixes  *Ju  and  sends  ~Jv  to  —-Jv.  Hence  it  is  C 2  x  C2. 

Conversely  suppose  that  L/k  is  normal  with  Galois  group  G  =  Gal(L/k)  = 
C2  x  C2.  Let  an  irreducible  polynomial  X4  +  bX2  +  c  in  k[X]  with  a  root  r  in  L  be 
given.  Since  L/k  is  normal,  X 4  +  bX 2  +  c  splits  in  L.  Let  the  four  roots  be  ±r  and 
±5.  The  square  u  of  any  of  these  roots  satisfies  ir  +  bit  +  c  —  0  and  therefore  lies 
in  a  quadratic  extension  within  K,  the  same  quadratic  extension  for  each  root.  Let 
us  define  K  to  be  this  extension.  Then  K  =  k(V/>2  —  4c).  Because  of  the  structure 
of  G,  there  exists  exactly  one  element  0  in  G  whose  fixed  field  is  K.  The  minimal 
polynomial  of  ±r  over  K  is  X2  +  \b  ±  ^  Jb  —  4c  for  one  of  the  two  choices  of  sign, 
and  the  minimal  polynomial  of  ±5  over  K  is  the  one  for  the  other  choice  of  sign.  The 
element  0  must  then  permute  the  roots  of  each  of  these  polynomials,  and  it  follows 
that  er(r)  =  ±r  and  a(s)  =  ±5.  Since  neither  r  nor  s  is  in  K,  we  must  in  fact  have 
0(r)  =  —r  and  a  (s)  —  —s.  Therefore  a(rs)  =  rs.  One  of  the  other  two  nontrivial 
members  r  of  G  has  r(r)  =  5.  Since  r2  =  r,  we  have  r (s)  =  r.  Thus  r (rs)  =  rs, 
and  we  see  that  every  member  of  G  fixes  rs.  Consequently  rs  is  in  k.  Since  rs  is 
equal  for  some  choice  of  signs  to 

±\J-\b  +  \s/b2  -  4c -J —\b  -  \slb2  -  4c  =  \b2  -  \  (b2  -  4c)  =  ±s/c, 

*fc  is  in  k.  In  other  words,  c  is  the  square  of  a  member  of  k,  as  asserted. 

In  (d),  suppose  that  c~l(b2  —  4c)  for  the  given  F(X )  is  a  square  in  k.  Arguing 
with  r2  as  in  (c),  we  see  that  K  =  k (s/b2  —  4c).  Making  the  same  computation  as 
in  the  display  just  above,  we  see  that  rs  —  c .  Since  c-1  (b2  —  4c)  is  a  square  in  k, 
y/c  lies  in  K.  One  of  the  roots,  say  r,  lies  in  L,  and  the  product  rs  —  ~Jc  lies  in  K, 
hence  in  L.  We  conclude  that  ±r  and  ±5  all  lie  in  L.  In  other  words,  L  is  the  splitting 
field  of  F(X)  over  k  and  is  normal.  Thus  L/k  is  normal.  The  Galois  group  must  be 
either  C2  x  C 2  or  C 4.  If  it  is  C2  x  C2,  then  (c)  shows  that  ^Jc  lies  in  k.  Under  our 
assumption  that  c~l(b2  —  4c)  is  a  square,  V b 2  —  4c  lies  in  k.  Consequently  F(X)  is 
reducible,  contradiction.  We  conclude  that  the  Galois  group  is  C 4. 

Conversely  suppose  that  L/k  is  normal  with  Galois  group  G  =  Gal(L/k)  =  C4. 
Let  an  irreducible  polynomial  X4  +  bX2  +  c  in  k[X]  with  a  root  r  in  L  be  given. 
Arguing  with  r2,  we  see  that  r2  lies  in  K  =  k (s/b2  —  4c ).  Since  L  is  generated  by 
k  and  r,  a  generator  of  G  cannot  send  r  into  ±r.  On  the  other  hand,  some  element 
of  G  has  to  send  r  into  —  r  since  —r  is  a  root  of  the  given  polynomial.  Therefore 
02(r)  —  —r.  Then  we  have  cr(ra(r))  =  cr(r)cr2(r)  =  — rer(r),  and  we  see  that 
a2(/'ff(r))  —  ra(r).  Consequently  ra  (r)  lies  in  K.  Computing  as  in  (c),  we  find  that 
y/c  lies  in  K.  This  member  of  K  has  its  square  in  k,  and  Problem  17  shows  that  v/c 
lies  in  k  or  in  the  set  of  products  kV b 2  —  4c.  By  (c),  ^/c  cannot  lie  in  k,  and  therefore 
yfc  =  ds/b 2  —  4c  for  some  d  in  k.  Hence  c-1  (b2  —  4c)  =  d~2  for  an  element  d  of 
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For  (e),  one  can  take  L  =  K(\/2)  and  K  =  Q(V2).  We  can  easily  see  directly 
that  L  is  not  normal.  But  let  us  use  (c)  and  (d).  The  minimal  polynomial  F ( X )  in 
question  is  X4  +  2,  with  b  —  0  and  c  =  2.  The  conditions  in  (c)  and  (d)  say  that 
L/Ik  is  normal  if  and  only  if  either  2  is  a  square  in  Q  or  —  1  is  a  square  in  Q.  Neither 
condition  is  satisfied,  and  hence  L/lk  is  not  normal. 

23.  A  cubic  will  be  irreducible  if  it  is  divisible  by  no  degree-one  factor  over  Q, 
hence  if  it  has  no  root  in  Q.  Since  these  cubics  are  monic  and  are  in  Z[X],  they  will 
be  irreducible  if  they  have  no  integer  root.  An  integer  root  must  divide  the  constant 
term,  and  we  check  that  neither  of  ±  1  is  a  root  in  either  case.  Hence  both  cubics  are 
irreducible.  By  Problem  13  the  Galois  group  in  each  case  is  a  transitive  subgroup  of 
03,  hence  is  63  or  24.  The  discriminant  —4 p3  —  27 q2  is  81  in  the  first  case  and  —31 
in  the  second  case;  this  is  a  square  in  the  first  case  but  not  in  the  second  case.  Thus 
X 3  —  3X  +  1  has  Galois  group  24,  and  X3  +  X  +  1  has  Galois  group  ©3. 

24.  The  extension  held  is  either  K  itself,  in  which  case  the  Galois  group  remains 
63,  or  it  is  L  =  K[V— 3  ].  Since  K/Q  is  normal.  Gal  (L/IK)  is  a  normal  subgroup  of 
Gal(L/Q)  of  order  2  with  quotient  isomorphic  to  Gal  (K/Q)  =  ©3.  The  groups  of 
order  12  are  classified  in  Problems  45^-8  at  the  end  of  Chapter  IV.  Two  such  groups 
are  abelian,  one  is  24,  and  one  is  /V,  A  C?  x  ©3. 

Write  a  general  element  of  L  as  a  +  bQ— 3.  Define  r  (a  +  bV— 3 )  =  a  —  bsj —2. 
This  is  the  nontrivial  member  of  the  2-element  group  Gal(L/K).  If  a  is  in  Gal  (K/Q), 
then  a  extends  to  a  member  a  of  Gal(L/Q)  by  the  definition  cr(a  +  b\/— 3)  = 
a  (a)  +  a  (b)*J— 3.  In  fact,  a  respects  addition.  To  see  that  it  respects  multiplication, 
we  compute 

l f  (a  +  bV— 3)er(c  +  dV— 3)  —  (a  (a)  +  o(b)\/^3)(a(c)  +  a{d)V—3) 

=  (cr(a)<7(c)  —  3  a(b)a(d))  +  (a{b)a{c)  +  a(a)a(d))  V— 3 
=  cr(ac  —  3 bd)  +  a  (be  +  ad)^/—3 
—  a((ac  —  3 bd)  +  (be  +  ad)V— 3  ) 

=  a  ((a  +  bv/— 3)(c  +  dV— 3)). 

It  follows  that  Gal(L/Q)  is  the  direct  product  C2  x  ©3,  the  subgroup  C 2  being 
Gal(L/K). 

25.  Yes.  Let  L  be  the  intermediate  held  corresponding  to  the  subgroup  { ( 1 ) ,  (1  2)}. 
Since  the  subgroup  is  not  normal,  L/lk  is  not  normal.  Let  r  be  any  element  of  L  not 
in  k.  Then  the  minimal  polynomial  of  r  over  k  has  degree  3,  and  it  does  not  split  in 
L  since  L/k  is  not  normal.  Its  splitting  held  has  to  be  something  between  L  and  K, 
and  the  only  choice  is  K. 

26.  Yes,  substitute  and  check  it. 

28.  In  (a),  direct  expansion  of  the  right  side  gives  (Z  —  r  )(X2  +  rX  +  ( r 2  +  p))  = 
X 3  +  pX  —  r3  —  pr.  Since  — r3  —  pr  —  q,  the  assertion  follows. 
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For  (b),  let  us  check  that  r2(—4p3  —  21q2)  =  (—3 r2  —  4p)(3q  +  2pr)2,  from 
which  the  assertion  follows.  In  fact,  the  right  side  equals 

—  (3r2  +  4p)(9q2  +  12  pqr  +  4p2r2) 

—  — ( 36pq 2  +  48  p2qr  +  llq2r2  +  16  p3r2  +  36  pqr3  +  12  p2r4) 

—  —r2(4p3  +  21q2)  —  12  p3r2  —  36  pq2  —  48  p2qr  —  36  pqr3  —  12  p2r4 

—  —r2{4p3  +  21q2)  —  12  p3r2  —  36  pq2  —  48  p2qr 
—  36  pq(—pr  —  q)  —  12  p2r(—pr  —  q) 

=  -r2{4p3 +  21q2). 

29.  No.  For  example,  F(X)  could  have  three  real  roots,  and  then  K  would  be  a 
subfield  of  R.  A  concrete  example  is  X3  —  12X  +  1,  which  is  <  0  at  —4,  is  >  0  at 
0,  is  <  0  at  1,  and  is  >  0  at  4;  the  Intermediate  Value  Theorem  shows  that  F(X)  has 
three  real  roots. 

30.  The  group  in  question  is  a  subgroup  of  ©5.  It  is  transitive  because  of  the 
irreducibility,  and  it  is  a  subgroup  of  Sts  since  the  discriminant  is  a  square.  Problem 
13c  shows  that  it  contains  a  5-cycle.  The  other  cycle  structures  in  SI5  are  the  3-cycles 
and  the  pairs  of  2-cycles.  If  a  3-cycle  is  present,  then  the  group  is  all  of  Sts  because 
15  divides  its  order,  all  groups  of  order  15  are  cyclic,  and  Sts  contains  no  subgroup 
of  order  30,  being  simple. 

Suppose  there  are  no  3-cycles.  A  Sylow  2-subgroup  may  be  taken  to  be  a  subgroup 
of//  =  {(1),  (1  2)  (3  4),  (1  3)(2  4),  (1  4)  (2  3)},  and  it  acts  on  the  group  of  powers 
of  a  5-cycle.  The  only  nontrivial  action  of  a  2-element  group  on  a  5-element  group 
carries  elements  to  their  inverses.  Since  no  nontrivial  element  of  H  commutes  with  a 
5-cycle  (because  65  has  no  elements  of  order  10),  the  Sylow  2-subgroup  contains  at 
most  two  elements.  If  it  is  trivial,  then  the  group  in  question  is  of  order  5,  consisting 
of  the  powers  of  a  5-cycle.  If  the  Sylow  2-subgroup  has  2  elements,  we  obtain  a 
semidirect  product  of  a  2-element  group  with  the  powers  of  the  5-cycle,  and  the  result 
has  to  be  isomorphic  to  the  dihedral  group  £>5 . 

Thus  the  only  possibilities  are  C 5,  £>5,  and  VI s  - 

31.  Computation  shows  that  the  discriminant  is  21272192,  which  is  a  square. 
By  Proposition  9.63  the  Galois  group  is  a  subgroup  of  2I5.  Modulo  3,  the  given 
polynomial  is  2  +  2x  +  x3  and  is  irreducible.  By  Theorem  9.64  the  Galois  group 
contains  a  5-cycle.  The  given  polynomial  factors  as  (7  +  x)(7  +  lOx  +  lx2  +  .r3) 
modulo  11,  and  Theorem  9.64  shows  that  the  Galois  group  contains  a  3-cycle.  The 
5-cycle  and  3-cycle  generate  all  of  2I5,  and  thus  the  Galois  group  is  2I5. 

32.  Write  e  and  /  for  e\  and  f\.  The  proof  of  Theorem  9.64  showed  that  /'  =  /. 
Then  e'f  =  \GP\  =  \G\/g  =  efg/g  =  ef  =  ef,  and  S  =  e. 

33.  If  =  n,  P-(Pi]p)  and  PtU  =  Uj  Q eSQiAP>) ,  then  pU  =  H  ( P-iPAp)U )  - 

n,  (W(Pilp)  =  Ui  (Rj  Q-(jQii'Pl)riP<]p).  Hence  e(  /J,  Ip)  =  e(Qij\Pi))e(Pi\p). 
The  formula  for  the  /’ s  follows  from  Corollary  9.7 . 
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34.  Corollary  9.58  shows  that  the  norm  and  the  trace  are  the  product  and  sum  of 
a  +  b^fm  and  a  —  b^fm.  Hence  they  are  a 2  —  b2m  and  2a.  This  proves  (a). 

In  (b),  the  minimal  polynomial  of  r  —  a  +  b^fm  has  degree  2  if  b  ^  0,  and  this 
is  the  same  as  the  degree  of  the  field  polynomial.  Hence  the  two  polynomials  are 
equal,  and  the  minimal  polynomial  is  X2  —  (Trr)X  +  N(r).  An  algebraic  integer  is 
an  algebraic  element  whose  minimal  polynomial  over  Q  has  integer  coefficients,  and 
(b)  follows. 

In  (c),  if  r  —  a  +  b^Jm  is  a  unit  with  inverse  s,  then  N(r)N(s )  =  N(rs)  = 
N(  1)  =  1  shows  that  N(r )  is  a  unit  with  inverse  N(s).  Conversely  if  r  is  in  T  with 
N(r)  —  ±1,  then  r(a  —  b^/m)  —  ±1,  and  ±(a  —  b^fm)  is  an  inverse  element  in  T. 

For  (d),  s/2  -  1  is  a  unit  in  the  algebraic  integers  of  Q[\/2].  Its  inverse  is  ~J2  +  1 . 

35.  With  respect  to  the  ordered  basis  ( 1 ,  \/2,  (\/2)2),  the  matrix  of  multiplication 
by  a  +  b  1/2  +  c(  IfT)2  is 

/a  2c  2b  \ 

I  b  a  2c  1  . 

\  c  b  a  / 

The  trace  and  norm  are  the  trace  and  determinant  of  this  matrix,  namely  3 a  and 
a 3  +  2  b2  +  4  c3  —  6  abc. 

36.  In  (a),  if  |  is  any  number  algebraic  over  Q  of  degree  r ,  then  the  norm  relative 
to  Q(r)/ Q  of  §  is  (— 1)'  M(0),  where  M(X )  is  the  minimal  polynomial  of  £  over 
Q.  Since  M(  1  —  (1  —  §))  =  0,  the  minimal  polynomial  of  1  —  f  is  the  polynomial 
M(  1  —  X)  adjusted  so  as  to  be  monic.  That  is,  it  is  P(X)  =  (— 1)'  M(1  —  X).  Hence 
the  norm  of  1  —  f  is  (— l)rP(0)  =  (— l)2rM(l)  =  M(  1).  In  the  case  of  the  given  f, 
the  minimal  polynomial  of  £  is  <1>„(X),  and  therefore  the  norm  of  1  —  t;  is  3>„(1). 

For  (b),  division  of  both  sides  of  the  identity  \\d\„  ^(X)  =  X"  —  1  by  X  —  1 

8ives  Yh\n,  d> l  =  x"~'  +  x"~ 2  +  ■  ■  ■  +  1.  Therefore  fl/ \n,  d>\  *</(!)  =  »• 

If  n  is  a  prime  power,  say  with  n  —  pk ,  let  us  see  by  induction  on  k  that  <t>„  ( I )  =  p. 
The  base  case  of  the  induction  is  k  =  1,  and  the  result  of  the  previous  paragraph 
applies.  Assuming  that  <I>„ (1)  =  p  for  n  =  pk ,  we  have  pk+l  =  n?=i  ^yll)  = 
O  pk-\-\  (1)  nti  P-  Therefore  $  p^~  i  (1)  —  P '  and  the  induction  is  complete. 

Inducting  on  n ,  let  us  now  show  that  <$>„  ( 1 )  =  I  if  n  is  divisible  by  more  than  one 
positive  prime.  The  base  case  of  the  induction  is  n  —  2.  Assume  that  n  =  pk>  ■  ■  ■  pkr 
and  that  the  result  is  known  for  integers  less  than  n .  We  may  assume  that  n  is  divisible 
by  at  least  two  positive  primes.  Then 

n=  n  w)=  fi  (fi^(D)  n  4></(d. 

d\n,  d>  1  5=1  /= 1  others 

where  the  “other  dT  are  the  divisors  of  n  that  are  divisible  by  at  least  two  primes. 
These  include  n  itself.  So  one  of  the  corresponding  factors  is  <f>„  ( 1 ),  and  the  others 
are  1  by  the  inductive  hypothesis.  The  factor  in  parentheses  is  pk'  by  the  result  of 
the  previous  paragraph,  and  the  product  of  the  factors  in  parentheses  is  n.  Therefore 
<J>„(1)  =  1,  and  the  induction  is  complete. 
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37.  For  (a),  the  imaginary  part  of  p~ 1  x  ±  p~ 1  —  1  is  not  an  integer,  and  therefore 

p~ 1  (x  ±  s/~  1 )  is  not  a  Gaussian  integer.  Consequently  p  does  not  divide  either  of 
x  ±  y/—\ .  Since  />  divides  .r2  +  1  in  Z  and  hence  in  Z[V—  1  ],  F  is  not  prime  in 

Z[y=T]. 

For  (b),  it  follows  since  p  is  not  prime  that  p  —  oif 1  nontrivially  in  Z[V— 1  ]•  Then 
p2  —  N(p)  =  N (a) N(/3).  Problem  34c  shows  that  nontrivial  factorization  implies 
that  N(ot)  and  N{/3)  are  not  units.  Thus  they  are  both  p.  If  a  =  a  +  by/—  1,  then  the 
equation  p  —  N(a)  says  that  p  =  a2  +  b 1 . 

38.  Let  N  be  the  norm  function  in  Q(V— 2).  Since  p  divides  x2  +  2  — 
(x  +  yf— 2 ) (x  —  yf—2 )  and  since  neither  of  p~x  (x  ±  \f— 2 )  is  of  the  form  a  +  by/— 2 
with  a  and  b  in  Z,  p  is  not  prime  in  Z[V— 2].  Write  p  =  aji  nontrivially.  Then 
p2  =  N(p)  =  N(u)N(P)  and  N(a )  =  Nift)  =  p.  If  a  —  a  +  by/^2,  then 
p  =  N(a)  says  that  p  =  a2  +  2b2. 

39.  This  is  similar  to  Problem  38  except  that  the  members  of  the  ring  are  of  the 
forma+by/—  3  witha,  dinZoro,  b  inZ+^.  Thus/)  =  N(a)  says  that  p  —  a2  +  3b 2 
either  with  a ,  b  in  Z  or  with  a ,  b  in  Z  +  j .  In  the  latter  case,  let  1  =  \  (—  1  —  v/— 3 ) . 
These  have  N(co±l)  —  1.  Therefore 

p  =  N(a )  =  N(aco±l)  =  N((a  +  byf-3  )(-2  ±  2^=3)) 

=  N(\{-a  =F  3 b)  +  \(±a  -  b)y/^3) 

=  (i(-a  =F  3fe))2  +  3 {\{±a  -  b)yf-3)2. 

Since  a ,  b  are  in  Z  +  one  of  a  +  b  and  a  —  b  is  even,  and  the  other  is  odd,  the  sum 
2fl  being  odd.  If  a  +  b  is  even,  then  a  —  3/;  is  even  since  their  difference  4 b  is  even, 
and  vice  versa.  Hence  one  of  the  two  choices  of  sign  exhibits  p  as  c 2  +  3 d2  with  c,  d 
in  Z. 

40.  Write  L'  =  Ik(x)  by  the  Theorem  of  the  Primitive  Element,  and  let  K  be 

a  splitting  field  of  the  minimal  polynomial  of  x  over  k.  Then  K  is  a  finite  Galois 
extension  of  k  by  Corollary  9.30,  and  we  have  k  C  L  C  L'  C  K.  For  a  in  L'  and  b  in 
L,  Corollary  9.58  says  that  Nv/k(a)  =  F[ a&G/H'  Nh/k(b)  =  YlaeG/H' 

and  Nv/h(a)  =  YItsh/h1  t(«)-  Hence  Nh/kiNv/hia))  =  Y\n£G/H  o-(Wl'/l(«))  = 

ricrsG/W  a{Y\T£H/H’  r(<J))  —  Y\o£G/H  YAtzH/H1  CTr(6r)  =  Y\a£G/H’  a  (a)  = 
Nh’/k(a).  The  formula  for  traces  follows  similarly  by  replacing  the  products  by 
sums  in  the  above  computation. 

41.  Since  P  is  symmetric,  P{Xa( i), . . . ,  !„(„))  =  P{X i, . . . ,  X„)  for  every 
permutation  it .  Therefore  P(ra( i), ... .  ra(n ))  =  P(r\ , . . . ,  rn)  for  every  a.  Problem 

39e  at  the  end  of  Chapter  VIII  implies  that  P(r\ . rn)  —  Q(s  \ ,  . . . ,  sn )  for  a 

polynomial  Q(X  \, ...  ,Xn)  in  k[Vi, . . . ,  X„\,  where  ,vi ,  . . . ,  sn  are  the  elementary 
symmetric  polynomials  in  r\ ,  . . .  ,rn.  The  elements  ,V| .  . . . ,  s„  are  the  coefficients  of 
F(X),  up  to  sign,  and  hence  are  in  k.  Therefore  P(r\ r„ )  —  Q(.s\ , . . . ,  s„)  is  in 
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42.  Inspection  of  the  formula  gives  H\  (X)  =  Y[?=  1  G(X  —  r,).  For  each  i,  we 
can  expand  G(X  —  r;  )  in  powers  of  X  as 

G(X  -  r,)  =  Xn  +  bn-4 0 ’■i)Xn~l  +■■■  +  bi(ri)X  +  b0(n), 

and  each  of i ,  . . . ,  7>oisamemberofk[X].  When  we  multiply  these  for  1  <  i  <  m . 
each  power  of  X  in  the  product  has  a  coefficient  that  is  unchanged  if  we  permute 
r\, ...  ,rn.  Problem  41  says  that  the  coefficient  of  each  power  of  X  is  therefore  in  Ik. 
Thus  H i  (X)  is  in  k[X],  A  similar  argument  shows  that  HiiX)  is  in  k[X\. 

43.  For  (a),  we  use  F ( X )  =  X2  —  2  and  G(X)  —  X2  —  3  in  the  previous  problem. 
Then  V2  +  V3  is  a  root  of 

(X  -  (V2  +  V3  ))(X  -  (V2  -  V3  ))(X  -  (- V2  +  V3  ))(X  -  (-V2  -  V3 )), 

which  must  have  coefficients  in  Q. 

44.  Proposition  4.40  extends  the  action  by  an  element  a  in  6n  uniquely  from  the 

set  {n . r„}  to  k[ri,  . . . ,  r„]  fixing  k.  The  extended  er  is  a  one-one  homomor¬ 

phism  of  k[ri , ...  ,rn]  into  itself,  hence  into  k(/'i , ...  ,r„).  It  extends  uniquely  to  a 
field  mapping  of  k(ri,  . . .  ,rn)  into  itself  by  Proposition  8.6.  The  homomorphism 
corresponding  to  a  composition  is  the  composition  of  the  homomorphisms,  and 
consequently  the  homomorphism  corresponding  to  er-1  is  a  two-sided  inverse  of  the 
homomorphism  corresponding  to  a .  Thus  the  extension  of  a  is  an  automorphism,  as 
required. 

Conclusion  (a)  is  immediate  from  Problem  20a.  For  (b),  since  K  is  generated  by 

k  and  r\, . . . ,  r„,  IK  is  certainly  generated  by  Ke"  and  r \ . rn.  We  have  arranged 

that  F(X)  splits  over  K,  and  hence  K  is  the  splitting  field.  Conclusion  (d)  follows 
from  Corollary  9.37  once  (c)  is  proved.  Thus  we  are  to  prove  (c). 

The  argument  for  (c)  is  similar  to  that  in  Problem  21.  Since  F(X)  is  in  KS”,  its 
coefficients  are  in  Ke".  Thus  k(w  i ,  . . . ,  un )  C  Ke".  Consequently  Corollary  9.37 
gives  n\  =  [K  :  K®"]  <  [K  :  k(u i , . . . ,  un)].  Problem  16  shows  that  the  right 
side  divides  n\.  Therefore  equality  holds  throughout,  and  we  see  that  [K  :  K®"]  = 
[K  :  k(i<i, . . . ,  un)].  Since  k(«i, . . . ,  un )  C  K®",  we  must  have  k(«i, . . . ,  u„ )  = 
I®». 

45.  For  (a),  we  have 

c\  =  J2ei  =  2  E  s‘sj  =  2p - 

i  i  <  j 

C2  =  E  =  E  sfsj  +  3  E  SiSjSkGi  +  Sj  +  Sk )  +  6S1S2S3S4, 

i  <  j  i  <  j  i  <  j  <k 

C3  =  0\02e3  —  E  S?Spk  +  2siS253i4(  E  sf ) 

i,j,k  i 

unequal 

+  2  E  sfsjsk  +  4sl s2S3S4(  E  s‘sj)- 

i<j<k  i<j 
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Part  (b)  is  a  calculation  with  symmetric  polynomials  and  is  omitted.  For  (c),  we 
have 


01  -  02  =  — Ol  -  54)fc  -  S3), 

01  -  $3  =  —  (si  —  S3)(S2  —  S4), 

02  -  03  =  —  (Si  -  S2)fc  -  S4). 

The  square  of  the  product  of  the  left  sides  is  the  discriminant  of  the  cubic  resolvent, 
and  the  square  of  the  product  of  the  right  sides  is  the  discriminant  of  the  given  quartic. 

46.  In  (a),  the  subgroups  in  question  are 

H  =  {(1),(1  2) (3  4),  (1  3) (2  4),  (1  4) (2  3)} 

and  24.  In  (b),  one  considers  the  possibilities  for  a  Sylow  2-subgroup  and  is  led  to 
conclude  that  the  only  possibilities  for  the  subgroups  in  question  are  the  powers  of  a 
4-cycle,  the  dihedral  group  (generated  by  H  and  (1  2  3  4)),  and  64.  (The  group  H 
and  any  2-cycle  generate  S4,  and  thus  the  dihedral  group  cannot  be  generated  by  H 
and  a  2-cycle.) 

47.  In  (a),  the  discriminant  reduces  when  q  —  0  to  16 p4r  —  128 p2r2  +  256 r3  = 
16 r(p4  —  8 p2r  +  16 r2)  —  16 r(p2  —  4 r)2.  This  is  0  if  r  =0  or  r  —  p1  / 4.  If  it  is 
nonzero,  it  is  a  square  if  and  only  if  r  is  a  square.  Hence  in  all  cases  it  is  a  square  if 
and  only  if  r  is  a  square. 

In  (b),  let  Y  —  X2.  The  equation  is  Y2  +  pY  +  r  —  0,  which  can  be  solved  with 
a  square  root.  For  each  of  the  two  solutions,  we  can  then  solve  for  X  with  a  square 
root.  Hence  all  the  roots  lie  in  an  extension  obtained  by  adjoining  at  most  three  square 
roots.  Thus  [K  :  Q]  divides  8,  and  |G|  divides  8.  Consequently  G  cannot  have  any 
element  of  order  3. 

In  (c),  the  irreducibility  shows  that  the  possibilities  for  G  are  as  in  Problem  46. 
Since  r  is  a  square,  the  discriminant  is  a  square,  by  (a).  Proposition  9.63  shows  that 
the  possibilities  are  as  in  Problem  46a.  Part  (b)  rules  out  24,  and  then  (c)  follows. 

In  (d),  r  nonsquare  and  F(X)  irreducible  implies  that  G  is  a  transitive  subgroup  of 
64  but  not  a  subgroup  of  24,  by  (a).  Problem  46b  shows  that  G  is  64,  or  the  powers 
of  a  4-cycle,  or  the  dihedral  group  £>4.  By  (b),  there  is  no  element  of  order  3,  and  64 
is  therefore  ruled  out. 

48.  The  polynomial  remains  irreducible  when  reduced  modulo  2,  and  a  prime 
factorization  modulo  3  is  (X  +  2)  (X3  +  X2  +  X  +  2).  Thus  G  is  a  transitive  subgroup 
of  64  containing  a  3-cycle.  The  discriminant  is  257,  not  square.  By  Problem  46b, 
G  —  64. 

49.  Part  (a)  is  just  a  computation;  the  answer  is  21234.  The  factorization  in  (b) 
is  routine  to  check,  and  the  only  issue  is  the  irreducibility  of  the  cubic  factor.  For 
a  cubic  polynomial,  irreducibility  follows  if  the  polynomial  has  no  root  in  the  field. 
Thus  we  need  only  verify  that  none  of  0,  1,  2,  3,  4  is  a  root  modulo  5. 
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Hints  for  Solutions  of  Problems 


For  (c),  the  conclusion  of  (b)  shows  that  the  only  possible  reducibility  over  Q  is 
into  a  degree-one  factor  and  a  cubic  factor.  For  X4  +  8X  +  12  to  have  a  degree-one 
factor,  it  must  have  a  rational  root,  and  this  root  must  be  an  integer  dividing  12.  Let 
r  be  an  integer  dividing  12.  If  r  is  even,  then  r4  +  8 r  is  divisible  by  16,  but  12  is  not; 
so  an  even  r  cannot  be  a  root.  We  are  left  with  ±1  and  ±3  as  the  possibilities,  and 
we  check  that  none  of  these  is  a  root. 

In  (d),  F(X )  is  irreducible,  and  G  is  transitive.  It  is  a  subgroup  of  24  since  the 
discriminant  is  a  square.  By  (b)  and  Theorem  9.64,  G  contains  a  3-cycle.  Problem 
46a  shows  therefore  that  G  —  24. 

50.  We  saw  in  Problem  49  that  G  =  24  for  X4  +  8X  +  12,  in  Problem  47c  that 
G  =  {(1),  (1  2) (3  4),  (1  3) (2  4),  (1  4) (2  3)}  for  X4  +  1,  in  Problem  48  that 
G  —  @4  for  X4  +  X+  1,  and  via  Eisenstein’s  criterion  that  G  —  Cf  for  <t>5  (X) .  Since 
X4  —  2  does  not  split  in  Q(-y2 ).  the  Galois  group  in  this  case  cannot  be  of  order  4, 
and  Problem  47d  shows  that  G  must  be  D\  in  this  case. 

5 1 .  For  (a),  let  C  correspond  to  a  set  of  polynomials  /  of  degree  at  most  n  —  1 .  If  C 
is  cyclic,  then /is  at  least  a  vector  space  over  F.  If.F(X)  =  co+ciX+-  ■  ■+cn-\X"~x 
is  in  /,  then  XF(X)  =  cqX  +  c\X2  +  ■  ■  ■  +  cn-\Xn  is  congruent  modulo  (X”  —  1) 
to  c„_  1  +  cqX  +  •  •  •  +  cn-2Xn~\  which  is  in  /  since  C  is  cyclic.  Hence  /  is  closed 
under  multiplication  by  X  mod  ( Xn  —  1)  and  hence  under  arbitrary  multiplications 
modulo  ( Xn  —  1).  Therefore  /  is  an  ideal  in  F[X]/(X”  —  1). 

Conversely  if  /  is  an  ideal  in  F[X]/(X"  —  1),  then  it  is  a  vector  space  and  is 
closed  under  multiplication  by  X  mod  ( Xn  —  1)  in  F[X]/(X"  —  1).  If  F(X)  = 

co  +  ciXH - \-cn- \Xn~x  is  in /.then  XT’ (X)  =  coX  +  ciX2-| - bc„_ \Xn  mod  / 

has  to  be  in  /,  and  the  corresponding  member  of  C  is  (c„_  1 ,  cq,c\,  ... ,  c„_  2).  Hence 
C  is  cyclic. 

For  the  remaining  parts,  we  identify  the  cyclic  code  C  with  the  corresponding  ideal 
/  in  F  [X]  /  (X"  —  1 ) .  In  (b),  let  the  lowest  degree  of  a  member  of  /  be  n  —  k ,  and  let  G  (X) 
be  a  member  of  /  of  this  degree.  If  there  is  a  second  member  of  this  same  degree, 
then  their  difference  has  lower  degree  since  both  polynomials  are  monic,  and  the 
difference  must  be  in  /,  contradiction.  Thus  G(X)  is  uniquely  defined.  Regard  G(X) 
as  a  member  of  F[X]  of  degree  n  —  k,  and  let  M(X)  =  GCD(G(X),  X"  —  1).  Then 
we  can  choose  A(X)  and  B(X)  in  F[X]  with  A(X)(X"  —  1)  +  B(X)G(X )  =  M(X). 
Passing  to  F[X]/(X"  —  1),  we  have  B(X)G(X )  =  M(X)  mod  (X"  —  1).  Therefore 
M(X)  is  in  the  ideal  /.  Since  the  degree  of  M(X)  is  at  most  degG(X)  and  since 
G(X)  has  the  minimum  degree  among  the  nonzero  members  of  /,  either  M(X)  =  0 
or  M(X)  =  G(X).  The  conclusion  M(X)  =  0  is  ruled  out  since  M(X)  is  a  greatest 
common  divisor  of  nonzero  polynomials,  and  thus  M(X )  =  G(X).  Therefore  G(X) 
divides  X"  —  1. 

Let  I  be  the  inverse  image  of  /  in  F[X],  This  is  an  ideal,  it  contains  G(X),  and 
it  contains  no  nonzero  element  of  degree  <  degG(X).  Since  I  has  to  be  principal, 
I  =  (G(X)).  In  other  words,  I  consists  of  all  products  of  G(X)  by  a  member  of 
F[X],  If  F(X)G(X)  is  such  a  product,  then  the  division  algorithm  gives  F(X)G(X)  = 
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B(X)(Xn  —  1)  +  R (X)  with  R(X)  —  0  or  deg  R  <  n.  Since  G(X)  divides  X"  —  1, 
G(X)  divides  R( X).  Therefore  every  member  of  I  is  congruent  modulo  X"  —  1  to  a 
product  G(X)S(X)  that  is  0  or  has  degree  <  n.  Then  (c)  is  clear. 

For  (d),  (b)  showed  that  G(X)  divides  X"  —  l  in  F[X],  Write  X"  —  l  —  G(X)H(X). 
If  B(X)  in  ¥[X}/(Xn  —  1 )  corresponds  to  a  member  of  C,  then(b)  shows  that  B(X  )  = 
F{X)G{X)  for  some  F(X)  in  F[Z],  Multiplying  by  H(X)  gives  B(X)H(X)  = 
F(X)G(X)H(X)  =  F(X)(Xn  -  1).  Hence  B(X)H{X)  =  0  mod  ( Xn  -  1).  Con¬ 
versely  if  B(X)H(X)  =  A(X)(Xn  -  1),  then  B(X)H(X )  =  A(X)G(X)H(X),  and 
B(X)  =  A(X)G(X). 

52.  In  (a),  if  r \ ,  rj,  r$  denote  the  rows  and  if  v\  =  r i  +  1*3,  V2  —  zr,  and  1)3  =  1-3, 
then  t>i,  ih ,  113  form  a  basis  for  the  row  space,  and  they  cycle  into  one  another  when 
the  columns  are  shifted  in  cyclic  fashion.  Consequently  the  code  is  cyclic.  Part  (b) 
involves  looking  at  the  7  nonzero  members  of  the  space,  and  one  can  just  do  that 
directly. 

In  (c),  one  such  matrix  is 

(0001  1  0  1  \ 

0  0  1  1  0  1  0  I 
0110100  I  ' 

1 101000/ 

A  little  check  shows  that  the  matrix  product  ‘HQ 1  is  the  4-by-3  zero  matrix,  and  hence 
Hv  =  0  for  each  v  in  C.  Thus  C  is  contained  in  the  null  space  of  H.  The  rank  of  H 
is  4  since  the  rows  are  certainly  linearly  independent.  Since  the  sum  of  the  rank  and 
the  dimension  of  the  null  space  is  the  number  of  columns,  namely  7,  the  dimension 
of  the  null  space  is  3.  Therefore  the  null  space  is  C  and  is  no  larger. 

For  (d),  the  general  matrix  H  is  to  have  n  columns  and  n  —  k  rows.  The  entries 
of  the  top  row  are  the  coefficients  of  H(X )  with  the  constant  term  at  the  right,  the 
coefficient  of  X  in  the  next-to-last  position,  and  so  on.  In  each  successive  row  these 
coefficients  are  shifted  one  position  to  the  left. 

Let  G(X)  —  go  +  g\X  +  ■  ■  •  +  gn—\QXn  k  and  H(X )  =  Iiq  +  h\X  +  ■  ■  •  +  Xk . 
We  know  that  {0,  XGQX),  X2G(X),  . . . ,  Xk~l G(X)}  is  a  basis  of  C.  In  terms  of 
members  of  F"  the  /th  such  vector  has  the  entries  go>  £i»  •  •  •  >  gn-k  beginning  in  the 
Ith  position.  The  (1,  j)th  entry  of  H  is  /z„_/  with  0’s  elsewhere  in  the  row,  and  the 
(z,  j)th  entry  is  hn-j-i+ 1  with  0’s  elsewhere  in  the  row.  The  product  of  the  zth  row  of 
H  and  the  /th  basis  vector  of  C  is  X^/=n-I_,+i  hn-j-i+igj-l,  which  is  the  coefficient 
of  x"_,+1“/  in  G(X)H(X).  Here  1  <  i  <  n—k  and  1  <  l  <  k,  so  that  2  <  i+l  <  n. 
Thus  the  power  of  X  in  question  varies  from  1  to  n  —  1.  Since  G(X)FI{X)  =  Xn  —  1, 
the  coefficient  is  0.  Thus  C  lies  in  the  null  space  of  H.  The  same  argument  with  rank 
as  in  the  previous  paragraph  shows  that  C  is  exactly  the  null  space. 

53.  Since  Xn  —  1  has  derivative  nX"~l,  we  have  GCDf  A-”  —  1,  nXn~l)  =  1  when 
n  is  odd.  Lemma  9.26  then  shows  that  X"-1  is  separable.  If  n  is  even,  write  n  =  2k. 
Then  Xn  —  1  =  ( Xk  —  l)2  in  characteristic  2  by  Lemma  9.18,  and  hence  every  root 
has  multiplicity  at  least  2. 
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Hints  for  Solutions  of  Problems 


54.  In  (a),  we  have  0  =  P(a^)  —  Co  +  c iaJ  +  C2<x2-'  +  ■  •  •  +  c„_ ia(n  17  7  for 
r  <  j  <  r  +  s,  and  therefore  the  column  vector  (co,  ci,  . . . ,  c„_ i)  satisfies 


/  ctr  a2r  ■■■  ctln~')r  \ 

I  ar+'  <*2(r+1)  1 


/ 


Co 

Cl 


C2 


\ 


0 


fs  ^(r+s) 


-l)(r+S)  / 


'  Cn— 1  ' 


Vo/ 


In  (b),  since  s  +  1  <  n,  the  number  s  +  1  of  rows  is  <  the  number  n  of  columns. 
Any  square  submatrix  of  size  s  +  1  is  a  Vandermonde  matrix  after  factoring  a  power 
of  a  from  each  column  and  transposing,  and  the  determinant  of  the  square  submatrix 
is  therefore  the  product  of  a  power  of  a  and  the  differences  —  ar+l  with  j  >  i. 
Since  a  is  nonzero  and  since  two  powers  of  a  can  be  equal  only  when  the  exponents 
differ  by  a  multiple  of  n,  the  determinant  of  the  square  submatrix  is  nonzero. 

In  (c),  suppose  that  s  +  1  or  fewer  of  the  coefficients  Co,  ci, . . . ,  cn-\  are  nonzero. 
Choose  s  +  1  of  them,  say  c/;  for  1  <j<  s  +  1 ,  such  that  the  remaining  ones  are  0. 
If  we  discard  the  others  from  the  matrix  equation  in  (a)  and  discard  the  corresponding 
columns  of  the  coefficient  matrix,  then  the  matrix  equation  is  still  valid  since  we  have 
discarded  only  0’s  from  the  given  equations.  The  resulting  system  is  square  with  an 
invertible  coefficient  matrix,  and  hence  the  unique  solution  has  c;;.  =  0  for  all  j.  But 
then  P(X )  =  0,  in  contradiction  to  the  assumption  that  P(X )  f-  0. 

In  (d),  if  some  nonzero  member  P(X )  of  C  has  weight  less  than  s  +  2,  then  (c) 
leads  to  a  contradiction.  Hence  every  nonzero  weight  is  >  s  +  2,  and  S  (C)  >  s  +  2. 

55.  Since  a  is  a  root  of  X"  —  1,  so  is  every  aJ .  Since  Fj  is  the  minimal  polynomial 
of  a  7  ,  Fj  divides  Xn  —  1.  Also,  1  +  X  =  X  —  1  divides  V”-1,  and  no  Fj  equals 
X  —  1,  since  a7  1  for  1  <  j  <  2e  when  2 e  <  n.  Therefore  G( X)  divides  Xn  —  1. 
Applying  Problem  54  with  r  —  0  and  5  =  2e,  we  see  that  the  code  C  generated  by 
G (X)  has  <5(C)  >  s  +  2  —  2e  +  2. 

56.  In  (a),  if  an  irreducible  polynomial  F(X)  of  degree  d  has  a  root  P  in  K,  then 
K  ^  F (P)  23  F,  and  [F(/3)  :  F]  =  d  must  divide  [K  :  F]  =  m.  In  the  previous 
problem  it  follows  that  each  Fj(X)  has  degree  dividing  in,  hence  degree  <  m.  The 
worst  case  for  the  degree  of  G{X)  is  that  the  LCM  equals  the  product,  and  then  the 
degree  of  G(X)  is  the  sum  of  1  (from  1  +  X)  and  the  sum  of  the  degrees  of  the 
Fj(X)' s.  Hence  deg  G  <  2 em  +  1  in  all  cases. 

In  (b),  let  nr  —  2r  —  1 ,  and  let  K  be  a  field  with  2r  elements.  Theorem  9.14  shows 
that  K  is  a  splitting  field  for  X2'  —  X  over  F.  Hence  it  is  a  splitting  field  for  X"r  —  1 
over  F.  Let  e  —  r,  so  that  e  <  nr/2  as  soon  as  r  >  3.  Using  this  e  in  the  previous 
problem,  we  obtain  a  cyclic  code  Cr  in  F"r  with  S  (Cr )  >  2r  +  2.  According  to  (a), 
the  generating  polynomial  Gr(X)  has  degree  at  most  2 er  +  1  =  2r2  +  1.  Therefore 
kr  —  dimCV  =  nr  —  deg  Gr  >  nr  —  2 r2  —  1=2'  —  2 r2  —  2.  Then  kr/nr  tends  to  1, 
and  S(Cr)  tends  to  infinity,  as  required. 

57.  In  (a),  the  polynomial  T’i(V)  splits  over  K  because  every  finite  extension  of 
a  finite  field  is  Galois.  The  Galois  group  Gal(K/F)  consists  of  the  powers  of  the 
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Frobenius  isomorphism  x  i— >  x2,  by  Proposition  9.40,  and  is  transitive  on  the  roots 
of  F i(X),  by  Problem  13a.  Hence  all  the  roots  are  of  the  form  or  ,  and  all  these 
elements  are  roots.  Taking  k  —  0,  1,  2,  3,  we  get  distinct  roots,  which  is  necessary 
since  K/F  is  separable. 

For  (b),  we  start  from  1  +  a  +  a4  =  0  and  compute  the  powers  of  a  in  terms  of 
1,  a,  a2,  a3.  The  interest  is  in  only  the  powers  a0,  a3,  a6,  a9,  a12,  but  some  of  the 
intermediate  powers  help  in  the  computation.  We  have 

3  3 

a  =  a  , 

a4  =  1  +  a, 
a5  =  a  +  or , 
a6  =  a2  +  a 3, 

a 9  —  a3a6  =  cr3(cr  +  a3)  =  a5  +  a6  =  a  +  a3, 
a12  —  ( a 2  +  a3)2  =  a4  +  a6  =  1  +  a  +  or  +  a3. 

Then  we  form  the  equation  a  +  ba 3  +  ca 6  +  da 9  +  an  =  0,  substitute  from  above, 
and  equate  coefficients.  The  result  is  a  homogeneous  system  of  four  linear  equations 
with  five  unknowns  in  F.  Solving,  we  find  that  the  space  of  solutions  is  1-dimensional 
with  a  —  b  —  c  —  d  =  e.  Therefore  the  minimal  polynomial  of  a3  has  degree  4  and 
is  1  +  a  +  a2  +  cr3  +  a4. 

In  (c),  we  apply  Problem  55  with  n  —  15  and  e  =  2.  Part  (a)  shows  that  F\  = 
Fi  —  F4,  and  part  (b)  computed  f  3  as  something  else  of  degree  4.  Therefore  G  ( X )  = 
(l  +  X)LCM(Fi,  F2,  F3,  F4)  =  (1+Z)LCM(FiF3)  =  (1  +  X)Fi(X)F3(X),  which 
has  degree  9.  Then  dimC  =  15  —  9  =  6,  and  Problem  55  gives  8(C)  >  2e  +  2  —  6. 

59.  In  (a),  Problems  12-13  are  applicable  when  the  scalars  are  extended  to  K 
because  the  minimal  polynomial  becomes  a  product  of  first-degree  factors.  The 
existence  in  the  conclusion  is  immediate  by  applying  (a)  through  (d)  in  Problem  12 
to  L  ®  1,  and  the  uniqueness  is  immediate  from  Problem  13. 

In  (b),  fix  a  basis  {u,  }  of  V  over  Ik.  Any  member  of  VK  has  a  unique  expansion  as 
J2i(vt  ®  Ci)  with  each  c;  in  K.  Since  <p(  1)  =  1,  application  of  the  given  identity  to 
v  ®  1  gives 

T( v  ®  1)  =  T(  1  ®  cp)( v  ®  1)  =  (1  <g>  ( p)T(v  0  1). 

If  we  expand  T(v  ®  1)  as  ®  q),  the  displayed  equation  says  that 


£/(*>!  ®  Ci)  =  (1  ®  (p)  T,i(vi  ®  Cj)  =  J2i(Vi  ®  <p(Ci)). 


Hence  tp(Ci)  —  Cj  for  all  i.  Since  <p  is  arbitrary.  Theorem  9.38  implies  that  c,-  is  in  k 
for  all  i.  Thus  JT  (ly  ®  c,  )  is  in  V.  If  we  write  T  v  for  this  element  of  V,  then  T  is  a 
k  linear  map  of  V  to  itself  such  that  T—  T  ®  1 . 
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In  (c),  we  multiply  the  identity  L  ®  1  =  S  +  A/” on  the  left  by  1  1  and  on  the 

right  by  1  ®  tp  to  obtain 

L  ®  1  =  (1  ®  tp~l)(L  ®  1)(1  ®  tp)  —  (1  ®  </>_1)S(l  ®  tp)  +  (1  0  tp~l)Af(  1  0  tp). 

The  equation  ((1®^-1)A/’(1®^))”  =  (1  ®</>-1)A/’"(l®</>)  shows  that  TVis  nilpotent. 
Since  ((1  ®  <^9^ 1  )^S (1  ®  </>))((l  0  tp~x)Af  (l  0  cp))  —  (1  0  ^)_1)5A/'(1  ®  </>)  = 
(1  ®  </>_1)A/\S(l  0  tp)  =  ((1  0  tp~1)Af(l  0  </>))((!  0  </>_1)S(l  0  </>))<  the  maps 
(1  ®  ^>-1)<S(l  ®  ^))  and  (1  ®  tp~l)Af(l  0  cp)  commute.  Finally  if  X/ (v/  ®  c,-)  is 
an  eigenvector  of  S  with  eigenvalue  A,  we  have  S(  X,-  ®  c,))  =  X!/  vi  ®  ■ 

Therefore  (1  ®  </>_1)S(l  0  tp)(J2i(vi  ®  <P_1(c;)))  =  (1  0  </>_1)  Xife  ®  A.C;)  = 
X;(Vi®><i0-1(A)<p-1(c;)),  andX,(t,i®<P_1(c/))  is  an  eigenvector  of  (l®^-1)S(l®</>) 
with  eigenvalue  ^-1(A).  Then  it  follows  that  (1  ®  ^-1)5(1  0  tp)  has  a  basis  of 
eigenvectors.  By  uniqueness  of  the  decomposition  L  ®  1  =  S  +  AT,  we  must  have 
(1  ®  <p-1)5(l  ®  tp)  =  S  and  (1  ®  tp~l)Af(l  0  cp)  =  Af.  Since  (p  is  arbitrary  in 
Gal  (IK/ Ik),  (b)  shows  that  <S  =  S  ®  1  and  Af =  N  0  1. 

In(d),  (IV" 01)  =  (A®  1)"  =  Afn,  and A/nilpotent implies iVnilpotent.  Similarly 
SAf  —  AfS  implies  SN  =  NS.  Then  the  fact  that  S’1*  =  S  0  1  =  S  has  a  basis  of 
eigenvectors  implies  that  S  is  semisimple. 

In  (e),  S  ®  I  and  N  ®  I  can  be  expressed  uniquely  as  polynomials  in  L  ®  1  that 
are  0  or  have  degree  less  than  the  degree  of  the  minimal  polynomial  of  i  0  1 ;  the 
coefficients  of  these  polynomials  are  in  K.  Application  of  a  member  tp  to  a  polynomial 
expression  501  =  P  (L  ®  1 )  just  affects  the  coefficients  and  gives  another  polynomial 
expression  for  S  unless  tp  fixes  each  coefficient.  By  uniqueness  and  Theorem  9.38, 
we  see  that  the  coefficients  are  in  k.  A  similar  argument  applies  to  N  ®  1 . 

60.  This  is  proved  by  the  same  argument  as  for  Problem  13  in  Chapter  V. 

6 1 .  The  splitting  field  for  the  minimal  polynomial  is  C.  According  to  the  procedure 
in  the  solution  of  Problem  59,  we  are  first  supposed  to  find  a  decomposition  over  C. 
In  a  suitable  basis  we  know  that  A  is  the  sum  of  a  diagonal  matrix  and  a  strictly  upper 
triangular  matrix,  and  this  is  the  Jordan-Chevalley  decomposition.  Section  V.6  shows 
how  to  find  the  Jordan  form  and  the  basis  over  C  in  which  it  is  realized.  We  transform 
the  D  and  N  back  separately  to  find  the  semisimple  and  nilpotent  components  of  A 
relative  to  the  standard  basis.  The  result  is  that 
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62.  In  (a),  if  there  were  a  basis  of  eigenvectors  over  K,  then  the  fact  that  the 
eigenvalues  are  equal  would  mean  that  A  is  similar  to  a  scalar  matrix.  This  is 
manifestly  not  so.  Thus  A  is  not  semisimple. 

In  (b),  a  matrix  (  “/’j'j  that  commutes  with  A  is  necessarily  of  the  form  ( “  c* ) 
and  has  characteristic  polynomial  X2  +  a2  +  c2x  since  the  characteristic  is  2.  If  the 
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characteristic  polynomial  reduces  to  0,  and  then  a  —  c  =  0.  In  this  case,  A  is  the  0 
matrix. 

In  (c),  suppose  that  A  =  S  +  N  is  a  Jordan-Chevalley  decomposition.  Then  (a) 
says  that  A  is  not  semisimple  and  hence  cannot  be  S.  On  the  other  hand,  (b)  says 
that  N  has  to  be  0  and  therefore  that  A  =  .S’  is  the  only  possibility.  The  result  is  a 
contradiction,  and  thus  there  is  no  Jordan-Chevalley  decomposition. 

63.  This  comes  down  to  what  is  happening  in  Problem  12  in  Chapter  V.  In  terms 
of  matrices,  the  problem  reduces  to  the  case  that  a  square  matrix  A  is  upper  triangular 
with  a  certain  nonzero  scalar  c  in  every  diagonal  entry.  Then  D  =  cl,  and  U  is  taken 
to  be  D~{  A. 

64.  In  (e),  for  characteristic  p  >  0,  —  1  is  the  sum  of  p  —  1  copies  of  1.  Hence  Ik 
cannot  be  formally  real. 

65.  In  (e),  we  have  (b~ 1  —  a~l)ab  —  a  —  b.  The  right  side  is  in  P,  and  so  are 
a  and  b.  Thus  the  remaining  factor,  b~ 1  —  a~~l,  has  to  be  in  P.  In  (f),  the  sum  of 
a(c  —  d )  >  0  and  (a  —  b)d  >  0  is  ac  —  bd  >  0.  In  (g),  expansion  of  (a  —  b)(c  —  d)  >  0 
gives  ac  +  bd  >  ad  +  be. 

66.  The  definition  is  that  *%"*„ is  positive  if  amb~ 1  is  in  P.  It  is  routine 
to  check  that  the  set  P'  of  positive  elements  of  Ik(jc)  is  closed  under  addition  and 
multiplication,  and  certainly  every  nonzero  element  is  in  exactly  one  of  P'  and  — P\ 

67.  In  (a),  one  ordering  has  a  +  b\! 2  in  P  if  a  +  2  >  0  in  the  ordinary  sense, 

and  the  other  has  a  +  Vl  in  P  if  a  —  b\[2  >  0  in  the  ordinary  sense. 

In  (b),  for  any  element  a  +  byfc  with  «2  >  b2c  ,  define  a  +  b^fc  to  be  in  P'  if  and 
only  if  a  is  in  P.  For  any  element  a  +  b^fc  with  b2c  >  a2,  define  a  +  b^fc  to  be 
in  P'  if  and  only  if  b  is  in  P.  The  only  element  left  undecided  by  this  process  is  0, 
which  is  not  to  be  in  P' .  The  elements  a  +  b^fc  in  P'  with  a2  >  b2c  will  be  said  to 
be  of  type  I,  while  those  with  «2  <  b2c  will  be  said  to  be  of  type  II.  It  is  clear  that 
each  nonzero  element  x  of  K  is  in  exactly  one  of  P'  and  —  P\  and  we  have  to  verify 
that  P'  is  closed  under  addition  and  multiplication. 

The  verification  is  a  little  complicated.  It  uses  parts  (f)  and  (g)  of  Problem  65 
repeatedly.  Consider  addition.  There  are  cases.  Case  1  is  that  a  +  b^fc  and  a'  +  b' «Jc 
are  in  P'  with  both  of  type  I.  If  the  sum  is  of  type  II,  then  addition  of  a2  >  b2c, 
a'2  >  b'2c,  and  (b  +  b')2c  >  ( a  +  a ’)2  gives  bb'c  >  aa'  upon  cancellation.  Squaring 
and  taking  into  account  that  aa'  >  0,  we  obtain  ( b2c)(b’2c )  >  era'2 .  On  the  other 
hand,  a2  >  b2c  and  a'2  >  b'2c  together  imply  a2a'2  >  (b2c)(b'2c),  contradiction. 
Thus  the  sum  is  of  type  I.  Since  a  and  a'  are  in  P,  so  is  a  +  a'.  Thus  the  sum  is  in  P' . 

Case  2  is  that  a  +  bj~c  and  a’  +  b'*Jc  are  in  P'  with  both  of  type  II.  If  the  sum  is 
of  type  I,  then  addition  of  a2  <  b2c ,  a'2  <  b'2c,  and  ( b  +  b')2c  <  (a  +  a')2  gives 
bb'c  <  aa'  upon  cancellation.  Squaring  and  taking  into  account  that  hh'  >  0,  we 
obtain  (b2c)(b'2c)  <  a2 a'2.  On  the  other  hand,  a2  <  b2c  and  a'2  <  b'2c  together 
imply  a2a'2  <  (b2c)(b'2c).  contradiction.  Thus  the  sum  is  of  type  II.  Since  b  and  b' 
are  in  P,  so  is  b  +  b' .  Thus  the  sum  is  in  P' . 
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Case  3  is  that  a  +  b^fc  is  of  type  I  and  a'  +  //  *Jc  is  of  type  II  (or  vice  versa). 
The  argument  now  depends  on  the  type  of  the  sum.  Case  3A  is  that  the  sum  is  of 
type  I.  Adding  a2  >  b2c,  b'2c  >  a'2,  and  (a  +  a1)2  >  (b  +  b')2c  and  canceling  gives 
a(a  +  a')  >  b(b  +  b')c.  We  want  to  see  that  a  +  a'  >  0.  If  a  +  a'  <  0,  then  the  left 
side  is  negative,  and  hence  both  sides  are  negative.  Thus  the  squares  of  the  two  sides 
are  related  in  the  opposite  order:  a2 (a  +  a')2  <  b2(b  +  b')2c2.  Here  the  right  side  is 
<  a2(b  +  b')2c,  and  we  get  (a  +  a')2  <  (b  +  b')2c,  in  contradiction  to  the  fact  that 
the  sum  is  of  type  I.  So  a  +  a '  is  >  0,  and  the  sum  is  in  P'.  Case  3B  is  that  the  sum 
is  of  type  11.  Adding  a2  >  b2c,  bl2c  >  a'2,  and  ( b  +  b')2c  >  (a  +  a’)2  and  canceling 
gives  b'(b  +  b')c  >  a' (a  +  a').  We  want  to  see  that  b  +  b'  >  0.  If  b  +  b'  <  0,  then 
both  sides  are  negative.  Thus  the  squares  of  the  two  sides  are  related  in  the  opposite 
order:  b'2(b  +  b')2c 2  <  a'2(a  +  a')2.  Here  the  right  side  is  <  b'2c(a  +  a')2,  and 
thus  ( b  +  b')2c  <  (a  +  a')2,  in  contradiction  to  the  fact  that  the  sum  is  of  type  II.  So 
b  +  b'  is  >  0,  and  the  sum  is  in  P' . 

This  completes  the  verification  that  P'  is  closed  under  addition.  We  now  consider 
multiplication,  again  dividing  matters  into  cases.  Case  1  is  that  a +b^/c  and  a' +b'  ^/c 
are  in  P'  with  both  of  type  I.  Applying  Problem  65g  to  the  inequalities  a2  >  b2c  and 
a'2  >  b'2c,  we  obtain  a2a'2  +  b2b'2c2  >  a2b'2c  +  a'2b2c,  which  says  that  the  product 
is  of  type  I.  We  are  to  show  that  aa'  +  bb'c  is  >  0.  From  a2  >  b2c  and  a’2  >  b'2c, 
we  obtain  0  <  a2 a'2  —  b2b'2c2  =  (aa'  +  bb'c)(aa'  —  bb'c).  Thus  aa'  +  bb’c  and 
aa'  —  bb'c  are  both  >  0  or  both  <  0,  and  they  are  the  same  as  their  sum,  which  is 
2 aa' .  Since  a  and  a'  are  in  P ,  we  have  aa'  >  0,  we  conclude  that  the  product  is  in 
P'. 

Case  2  is  that  a  +  by/c  and  a'  +  b'^/c  are  in  P'  with  both  of  type  II.  Applying 
Problem  65g  to  the  inequalities  b2c  >  a2andb'2c  >  a'2,weobtaina2a'2+b2b'2c2  > 
a2b'2c+a'2b2c ,  which  says  that  the  product  is  of  type  I.  We  are  to  show  that  aa'+bb'c 
is  >  0.  From  a2  <  b2c  and  a'2  <  bl2c,  we  see  that  0  <  b2b,2c2  —  a2a'2  — 
(bb'c  +  aa')(bb'c  —  aa').  Thus  bb'c  +  aa'  and  bb'c  —  aa'  are  both  >  0  or  both  <  0, 
and  they  are  the  same  as  their  sum,  which  is  2 bb’ .  Since  b  and  /;'  are  in  P,  we  have 
bb'  >  0,  we  conclude  that  the  product  is  in  P' . 

Case  3  is  that  a  +  b^fc  is  of  type  I  and  a’  +  b'^/c  is  of  type  II  (or  vice  versa).  From 
(a2  —  b2c)(b'2c  —  a'2)  >  0,  we  obtain  c(a'2b2  +  a2b'2)  >  a2a'2  +  b2b'2c2.  Addition 
of  2 aa’bb'c  to  both  sides  yields  ( ab '  +  a'b)2c  >  (aa'  +  bb'c)2,  an  inequality  that 
shows  the  product  to  be  of  type  II.  To  show  that  the  product  is  in  P',  we  are  to  show 
that  ab'  +  a'b  >  0.  The  product  of  a2  >  b2c  and  b’2c  >  a'2  gives  a2b'2  >  b2a'2 
upon  cancellation  of  c,  c  being  positive.  Then  (ab'  +  ba')(ab'  —  ba ')  >  0,  and  the 
two  factors  have  the  same  sign.  Now  a  >  0  and  b'  >  0  since  the  given  elements 
are  in  P' .  Thus  ab'  >  0.  Arguing  by  contradiction,  suppose  that  ab'  <  ba' .  Then 
ab'  >  0  implies  (ab')2  <  (ba')2,  in  contradiction  to  a2b'2  >  b2a'2.  We  conclude 
that  ab'  >  ba' ,  hence  that  ab'  —  ba'  >  0.  Thus  ab'  +  ba'  >  0,  as  required. 

These  steps  complete  all  the  verifications  that  P' ,  as  we  have  defined  it,  is  a  positive 
system.  It  remains  to  define  a  second  version  of  P'  and  to  carry  out  the  verifications 
for  it.  For  the  definition,  there  is  no  change  if  a2  >  b2c,  but  if  b2c  >  a,  then  a  +  b^fc 


Chapter  IX 


707 


is  to  be  in  P'  if  and  only  if  —b  is  in  P.  The  verifications  are  essentially  unchanged 
except  that  the  roles  of  b  and  —b  are  interchanged  throughout. 

68.  In  (a),  the  integer  n,  if  it  exists,  cannot  be  0  because  Ik  is  formally  real  by 
Problem  64d.  So  n  >  1.  We  write  l-j  =  aj  +  bjjcn  with  each  aj  and  bj  in 
kl^/cT,  . . . ,  *Jc„- 1  )  and  expand  out  the  squares. 

In  (b),  let  k'  =  kC^/cq,  . . . ,  Jcn-\  ).  If  the  coefficient  of  ^/c/,  is  0,  then  (*) 
becomes  an  equality  in  k'  that  exhibits  k'  as  not  formally  real,  in  contradiction  to  the 
definition  of  n.  If  the  coefficient  of  ^/c/  is  not  0,  then  (*)  exhibits  as  a  member  of 
k\  again  in  contradiction  to  the  definition  of  n.  The  conclusion  is  that  K  is  formally 
real. 

69.  Order  the  formally  real  subfields  of  k  by  inclusion  upward.  The  set  of  such 
subfields  is  nonempty  since  k  is  one.  The  union  of  a  chain  of  such  subfields  is  again 
such  a  subfield  because  any  expression  of  a  sum  equal  to  —  1  has  to  be  valid  in  a  finite 
such  union.  By  Zorn’s  Lemma,  there  is  a  maximal  element  IK.  By  maximality,  K  is 
a  real  closed  field. 

70.  In  (a),  if  c  is  not  a  square  in  k,  then  k(^/c)  is  a  proper  algebraic  extension 
of  k.  Since  k  is  maximal  among  formally  real  subfields  of  k,  k(^/c )  is  not  formally 
real.  Therefore  —1  is  a  sum  of  squares  in  k(^/c),  as  indicated. 

In  (b),  expansion  gives  —  1  —  ^  aj + c  ^  bj + 2^/c  Y2ajbj.  Equating  coefficients 
of  1  and  *Jc  shows  that  —  1  =  +  c  12  bj .  We  cannot  have  bj  —  0  because 

otherwise  we  would  have  —1  =  J^aj  and  k  would  not  be  formally  real.  Thus 
— c  =  (1  +  aj)/  bj,  and  —  c  is  exhibited  as  a  sum  of  squares,  hence  a  member 
of  P.  Thus  if  c  is  not  a  square  in  k,  then  c  cannot  be  a  sum  of  squares  in  k.  The 
contrapositive  is:  every  sum  of  squares  in  k  is  a  square  in  k. 

In  (c),  the  equality  —  c  =  (1  +  aj)/  ^  bj,  in  view  of  (b),  exhibits  — c  as  the 
quotient  of  two  squares,  hence  as  a  square. 

In  (d),  let  P  be  the  set  of  nonzero  squares.  We  see  from  (a)  through  (c)  that  every 
nonzero  element  is  in  P  or  in  —  P.  By  (b),  every  sum  of  squares  is  a  square;  thus 
P  is  closed  under  addition.  It  is  clear  that  P  is  closed  under  multiplication.  Thus  F 
becomes  an  ordered  field.  Problem  64b  shows  that  every  nonzero  square  has  to  be  in 
P,  and  thus  P  is  the  only  possibility  for  the  set  of  positive  elements. 

7 1 .  In  (a),  let  n  be  the  least  odd  positive  integer  such  that  some  polynomial  over 
k  of  degree  n  has  no  root  in  k.  If  this  polynomial  were  reducible,  some  factor  of  it 
would  have  smaller  odd  degree  and  would  have  a  root.  So  the  polynomial  in  question 
has  to  be  irreducible. 

In  (b),  if  —1  is  a  sum  of  squares  in  k(a),  then  we  have  —1  =  S/=i  (a)2 
for  suitable  polynomials  Rj(X)  in  k[X],  necessarily  of  degree  <  n  —  1.  In  other 
words,  Y^j=  l  +  1  is  a  member  of  k[X]  that  vanishes  at  a.  Since  Q(X) 

is  the  minimal  polynomial  of  a,  Q(X)  divides  Y^kj=i  7^/ ( W )2  +  1,  and  we  obtain 
—  1  =  Y^kj=\  Rj(X)2  +  Q{X)A{X)  for  a  suitable  polynomal  A(X)  in  k[X]  of  degree 
<  n  —  2. 
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Hints  for  Solutions  of  Problems 


In  (c),  the  equality  of  the  coefficients  of  X2n~ 2  in  the  polynomial  identity  of  (b) 
shows  that  the  —  1  equals  the  sum  of  squares  of  the  leading  coefficients  of  the  Rj ’s 
plus  the  coefficient  of  Xn~  1  in  A(X).  The  coefficient  of  Xn~2  in  A(X)  cannot  be  0,  or 
else  —1  would  be  exhibited  as  a  sum  of  squares  in  k.  Thus  A(X)  has  degree  exactly 
n  —  2,  which  is  odd.  The  inductive  hypothesis  applies  to  A(X )  and  says  that  A(X) 
has  a  root  r.  We  evaluate  the  polynomial  identity  from  (b)  at  r ,  take  into  account  that 
A(r)  =  0,  and  obtain  —  1  =  X^/=i  Z?,(r)2.  Again  we  have  a  contradiction  to  the  fact 
that  k  is  formally  real,  and  thus  the  minimal  integer  n  in  (a)  cannot  exist. 

72.  The  indicated  proof  goes  through  without  essential  change. 

73.  Problem  68  shows  that  k  is  contained  in  a  certain  formally  real  subfield  L  of  k, 
Problem  69  shows  that  L  is  contained  in  a  real  closed  subfield  K  of  k,  and  Problem  70 
shows  that  K  becomes  an  ordered  field.  The  set  of  positive  elements  for  K  includes 
all  squares  by  Problem  64b,  and  all  the  members  of  k  in  P  have  become  squares  in 
L  by  definition  of  L.  Therefore  all  members  of  P  are  squares  in  K.  The  fact  that 
k  =  K(V~ T )  follows  from  Problem  72. 


Chapter  X 

1.  If  R  is  a  field,  then  the  only  ideals  are  0  and  R,  and  they  certainly  satisfy  the 
descending  chain  condition.  Conversely  if  the  ideals  satisfy  the  descending  chain 
condition,  then  there  is  a  minimal  nonzero  ideal  I.  Fix  m  ^  0  in  /.  For  any  nonzero 
element  a  e  / ,  Ra  =  I  since  I  is  a  simple  module.  If  x  ^  0  is  in  R,  we  apply  this 
observation  to  xm,  which  is  nonzero  since  R  is  an  integral  domain.  Since  Rxm  —  /, 
there  exists  y  in  R  with  yxm  =  m.  Then  (1  —  yx)m  —  0.  Since  R  is  an  integral 
domain  and  m  ^  0,  we  obtain  1  —  yx  =  0.  Therefore  y  =  x~x . 

2.  In  (a),  let  C2  =  {±1}.  Define  r(l)  =  andr(-l)  =  Then  r 

is  a  representation  since  1  +  1  =  0  in  F.  The  subspace  U  =  is  invariant. 

If  there  were  a  complementary  invariant  subspace,  there  would  be  an  eigenvector  of 
r{—  1)  notin  U.  However,  the  roots  of  the  characteristic  polynomial  are  both  1,  anda 
second  eigenvector  would  mean  that  r  (—  1 )  is  the  identity,  which  it  is  not.  For  (b),  the 
representation  in  (a)  makes  F2  into  a  unital  left  R  module,  the  R  submodules  being 
the  invariant  subspaces.  There  is  no  complementary  R  submodule  to  U ,  and  hence 
F2  is  not  semisimple  as  an  R  module. 

3.  If  {as}  is  a  set  of  generators  of  M  as  a  right  R  module  and  { bt }  is  a  set  of 
generators  of  N  as  a  left  R  module,  then  {<+  ®  bt]  is  a  set  of  generators  of  M  N 
as  an  abelian  group.  Then  (a)  follows  from  this  fact  and  the  fact  that  1  generates  both 

Z/A;Z  and  Z/ZZ. 

In  (b),  if  /  =  dk  for  some  d  and  if  b  has  b  —  qk  +  r  with  0  <  r  <  \k\,  then 
fll  ®  bl  =  aqk{  1  ®  1)  +  (a  1  ®  rl)  =  aq(kl  ®  1)  +  (a\  ®  r  1)  =  al  <8)  rl,  and  it 
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follows  that  the  map  a  1  ®  M  a  1  ®  (Z?  mod  A:)  1  is  a  well-defined  group  isomorphism 
of  (Z/AZ)  (Z//Z)  onto  (Z/AZ)  ®z  (Z/AZ). 

In  (c),  let  A(xl ,  yl)  =  xy  mod  A  for  x,  y  e  Z/AZ.  This  is  Z  bilinear  from 
Z/AZ  x  Z/AZ  into  Z/AZ  and  extends  to  a  group  homomorphism  L  :  Z/AZ®^Z/AZ 
— >  Z/AZ  with  L(xl  ®  yl)  =  xy  mod  A'.  In  particular,  L(1  ®  1)  =  1  mod  A. 
Therefore  A  divides  the  order  of  1  ®  1,  and  Z/AZ  x  Z/AZ  has  at  least  |A|  elements. 

In  (d),  we  have  0  =  A1  ®  1  =  A(1  0  1)  and  0  =  10/1  =  /(I  ®  1).  If  xA  +  yl  —  cl, 
then  d(l  0  1)  =  x(A(l  0  1))  +  y(/(l  0  1))  =  0.  Hence  1  0  1  has  order  dividing  d. 
By  (c),  1  ®  1  has  order  at  least  \d\.  The  result  follows. 

4.  In  (a),  each  ker<p"  is  an  R  submodule  of  M,  and  these  R  submodules  form 
an  ascending  chain.  Hence  they  are  the  same  from  some  point  on.  Similarly  each 
image  tpn  is  an  R  submodule  of  M,  and  these  form  a  descending  chain.  Hence  they 
are  the  same  from  some  point  on. 

In  (b),  if  x  is  in  1C  CX,  then  tpN  x  —  0  and  x  =  <pNy  for  some  y.  Then  0  =  tpN x  = 
cp2N y.  Since  y  is  in  ker  tp2N  =  ker  tpN ,  we  obtain  0  =  < pNy  —  x,  and  x  =  0. 

In  (c),  if  x  is  in  M,  then  tpN x  is  in  image  <pN  —  image  qrN .  Hence  tpN  x  —  tp2N z  = 
cpN  (tpN z)  for  some  z  e  M,  and  tpN x  =  tpN y  with  y  =  tpN z. 

For  (d),  if  x  is  in  M,  let  y  be  as  in  (c),  and  write  x  =  (x  —  y)  +  y.  Then 
<pN (x  —  y)  =  tpN x  —  (pN y  —  0  and  y  =  tpNz  show  that  x  —  y  is  in  K,  and  y  is  in  I. 
Thus  M  =  1C  + 1.  Since  K.CM  =  0  by  (b),  M  —  1C  0  X. 

In  (e),  we  know  that  ^)(image^'!)  =  image  tpn+l  for  all  n.  Taking  n  >  N ,  we  see 
that  tpil)  —  X.  From  (b),  kex(yp\^)  C  /C  Pi  X  —  0.  Therefore  q>  is  one-one  from  X 
onto  itself.  In  addition,  <p(ker^>")  C  ker (pn~x  for  all  n.  Taking  n  >  N  shows  that 
<p(IC)  C  1C.  Forx  in  1C,  we  have  tpN x  —  0.  Therefore  (<p\^)N  =  0. 

5.  If  (i)  holds,  then  ^jr\N,  is  one-one  from  N'  onto  P .  Let  a  be  its  inverse.  Then 
ct  :  P  — »•  N'  is  one-one  with  \j/a  =  1^.  So  (ii)  holds. 

If  (ii)  holds,  then  any  n  in  N  has  the  property  that  n  —  aifr(n)  has  \jf(n  —o\jr{ny)  — 
ir{n)  —  lpx/r(n)  =  0  and  is  therefore  in  images.  Write  n  —  cr\!/(n)  —  tpitn)  for 
some  m  depending  on  n;  m  is  unique  since  (p  is  one-one.  If  r  :  N  — >•  M  is  defined 
by  r  (n)  =  m,  then  r  is  an  R  homomorphism  by  the  uniqueness  of  m.  Consider 
r((p(m))  for  m  in  M.  The  element  n  —  tp(m)  has  n  —  cril/(n)  =  tp(m )  —  atjnpCm )  = 
tpitn)  —  a( 0)  =  tp(m),  and  the  definition  of  r  says  that  z(t p(m))  =  m.  Hence 
r <p  =  1  m,  and  (iii)  holds. 

If  (iii)  holds,  then  N'  —  kerr  is  an  R  submodule  of  N.  If  n  is  in  N'  Cl  images, 
then  n  —  tp(m)  for  some  m  e  M  and  also  0  =  r(n)  =  rtp{m)  —  1m(»0  =  m.  So 
n  —  0,  and  N'  Cl  images  =  0.  If  n  e  N  is  given,  write  n  —  (n  —  tpr(n))  +  q>r{n). 
Then  q>r(n)  is  certainly  in  images,  and  r in  —  cpr(n ))  =  r (n)  —  l«r(«)  =  0  shows 
that  n  —  tpT(n)  is  in  N' .  Therefore  N  =  N'  0  images.  Since  images  =  ker xfr,  we 
see  that  N  —  N'  0  ker  i Jr  and  that  (i)  holds. 

6.  For  (a),  the  conjugation  mapping  C  on  A,  carrying  1  to  itself  and  carrying  i,  j, 
and  k  to  their  negatives,  respects  addition  and  satisfies  C(xv)  =  C(y)C(x).  Hence 
it  exhibits  R  and  R°  as  isomorphic.  Then  the  result  follows  from  Proposition  10.14. 
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Hints  for  Solutions  of  Problems 


For  (b),  again  by  Proposition  10.14,  we  need  a  noncommutative  ring  R  with  identity 
such  that  R  is  not  isomorphic  to  R".  Let  F  be  a  field  with  two  elements,  and  let  R 
be  the  8-element  ring  consisting  of  all  matrices  ^ j  with  a,b,  c  in  F.  Define  x 
to  be  the  matrix  with  a  —  1  and  b  =  c  =  0,  and  define  y  to  be  the  matrix  with 
b  —  1  and  a  =  c  —  0.  Computation  shows  that  x2  =  x,  y2  —  0,  xy  —  y,  and 
yx  =  0.  A  ring  isomorphism  of  R  with  R°  is  the  same  as  an  additive  isomorphism 
that  reverses  the  order  of  multiplication,  and  we  call  this  an  “antiautomorphism”  of 
R.  Suppose  that  an  antiautomorphism  <p  of  R  exists.  We  must  have  <p(  1)  =  1. 
Suppose  that  <p(x)  =  u  and  <p(y)  —  v.  Then  u  =  cp(x)  —  (fi(x2)  =  tp(x)2  —  u2  and 
0  =  q>(y2)  —  <p(y)2  =  v2.  Expanding  u  and  v  in  terms  of  the  basis  {1,  x,  y)  and 
computing,  we  find  that  u  —  k  1  +  lx  and  v  =  my  with  k,  /,  m  in  F.  Since  <p  reverses 
the  order  of  multiplication,  we  have  uv  —  <p(x)<p(y)  =  < p(yx)  =  ^(0)  =  0.  Thus 
0  =  (kl  +  lx)(my)  —  kml+lmxy  =  (km)\  +  (lm)y,  and  km  =  Im  —  0.  Therefore 
either  m  —  0  or  k  —  l  —  0.  In  the  first  case,  <p(y)  =  v  =  my  =  0;  in  the  second  case 
< p(x)  =  u  =  kl  +  lx  =  0.  In  either  case,  tp  fails  to  be  one-one.  We  conclude  that  no 
antiautomorphism  <p  of  R  exists. 

7.  Take  the  sum  of  all  simple  R  submodules  of  M. 

8.  Example  4  in  Section  5  shows  that  A  ®p  K  is  a  vector  space  over  K  in  such  a 
way  that  ko(a  ®  k)  —  a  ®  kok.  It  is  therefore  enough  to  show  that  the  multiplication 
is  K  linear  in  each  variable  of  the  product.  Additivity  is  known,  and  it  is  enough  to 
check  that  k0((a\ ®ki)(ci2®k2))  —  (ko(ai®ki))(a2®k2)  =  {a\  ®k])(ko{a.2®  fe)). 
Since  scalar  multiplication  by  ko  equals  left  multiplication  by  1  ®  ko,  the  left  equality 
is  immediate  from  associativity  of  multiplication,  and  the  right  equality  follows  from 
associativity  and  from  the  formula  ( a\  ®  Aq)(l  ®  ko)  —  a i  ®  k\ko  =  a\  ®  kok\  — 

(1  0  ko)(a\  ®  ki). 

9.  Define  /x(x)(y)  =  [x,  y]  for  x  and  y  in  g,  and  let  v(c){d)  =  cd  for  c  and  d  in  L. 
Then/x(x)  :  g  — >•  g  and  v(c)  :  L  ->  L  are  K  linear.  Therefore  b(x,  c )  =  /x(x)®  v(c) 
is  K  bilinear  from  g  x  L  into  the  K  vector  space  Endjclfl  ®k  L),  and  it  extends  to  a 
K  linear  mapping  L  :  g  ®k  L  — >■  Ending  ®K  L).  Define  [X,  Y]  —  L(X)(Y). 

With  the  Lie  algebra  multiplication  now  well  defined  in  g  ® k  L,  one  readily  checks 
the  two  required  properties.  Therefore  g  ®k  L  is  a  Lie  algebra  over  K  satisfying  the 
two  required  identities. 

Meanwhile,  we  know  that  g  ®k  L  is  a  vector  space  over  L  because  of  a  change  of 
rings.  To  complete  the  proof,  we  need  to  show  that  the  multiplication  is  L  linear,  not 
just  K  linear.  It  is  enough  to  check  L  linearity  in  the  second  variable  because  of  the 
alternating  property.  Let  s  be  in  L,  and  let  x  0  c  and  y  0  d  be  elements  of  g  ®k  L. 
Then  we  have  [x®c,  s(y®rf)]  =  [x®c,  y®sc/]  =  [x,  y]®csd  =  s{[x,y]®cd)  = 
i[.r  0  c,  y  0  d].  Forming  K  linear  combinations,  we  obtain  the  desired  L  linearity 
in  the  second  variable  of  the  Lie  algebra  product. 

10.  This  problem  will  follow  from  the  uniqueness  of  the  tensor  product  as  given  in 
Theorem  10.18  if  it  is  shown  that  ((A  ®j?  B )///.  <://;?)  is  a  tensor  product  of  A  and  B 
over  R.  Thus  let  p  :  A  x  B  — »•  G  be  an  R  bilinear  function  from  Ax  B  into  an  abelian 
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group  G.  Since  /I  is  automatically  Z  bilinear,  there  exists  a  group  homomorphism 
<p  :  A  ®z  B  — »■  G  such  that  tp(a  ®  b)  =  yS (<7 ,  b )  for  all  a  e  A  and  b  e  B.  Then 
tp(ar  ®b  —  a®  rb)  —  tp(ar  ®  b)  —  tp{a  ®  rb )  =  /3(ar,  Z?)  —  /3(a,  rb).  The  right  side 
is  in  H,  and  hence  tp  descends  to  a  group  homomorphism  q>  :  ( A  ®z  B)/H  -+  G 
such  that  Tpq  —  <p.  Then  fi(a,  b)  =  tp(a  ®  b)  —  tpqb2{a,  b)  shows  that  Tp(qb2)  —  ft- 
Thus  Tp  is  the  required  additive  extension  of  /3.  For  uniqueness,  suppose  Tp'  is  a  second 
additive  extension  of  fi.  Then  Tp'qb2{a ,  b)  —  Tpqb2(a,  b)  for  all  a  e  A  and  b  e  B, 
andhenc  &Tp'  q(a®b)  —  Tpq(a®b).  The  elements  a  ®  b  generate  A  ®i  B,  and  hence 
Tp'q  —  Tpq  on  A  ®i  B.  Since  q  maps  onto  {A  ®z  B)/H,  7p'  —  <p  on  (A  ®i  B)/H. 

11.  We  are  to  show  that  if  C  is  a  commutative  associative  R  algebra  with 

identity  and  if  <p\  :  Ai  — >•  C  and  <p2  :  A2  — >•  C  are  homomorphisms  of  commu¬ 
tative  associative  R  algebras  with  identity,  then  there  exists  a  unique  homomorphism 
tp  :  Ai  ®yy  A2  — >  C  of  R  algebras  with  identity  such  that  tpi\  —  tp\  and  (pi2  =  q>2- 
Define  b(a\,  02)  =  <P\ (a\)tp2(ci2)-  This  is  R  bilinear  into  C  because  b(a\r,  (12)  — 
tp\ (a\r)(p2(a2)  =  <Pi(ai)r(p2(a2)  =  Vi{ai)(p2{ra2)  —  b(a2,rci2),  and  hence  there 
exists  a  unique  homomorphism  q>  :  A\  A2  —>  C  of  abelian  groups  such  that 
(p(a\®a2)  =  b(a\,a2)  =  cp\ (a\)(p2{a2).  Then 1  (<ai )  =  =  (p\{a\)q>2{\)  = 

tp\(a\)l  =  <p\ ,  and  <pi\  —  <p\.  Similarly  ipi2  —  tp2.  To  complete  the  proof,  it  is  enough 
to  show  that  the  homomorphism  tp  of  abelian  groups  is  a  homomorphism  of  R  algebras. 
The  fact  that  tp  is  a  homomorphism  of  R  modules  is  immediate  from  Corollary  10. 19. 
Also,  tp(  1  ®  1)  =  qi\(\)tp2(\)  —  1  shows  that  tp  carries  identity  to  identity.  Finally 
the  computation  ^((rq  ®  a2)(a[  ®  a'2))  =  tp(a\a[  ®  a2a !,)  —  tp\(a\a\)tp2(a2a'2)  = 
(p\{ax)tpi{a[)tp2(a2)(p2{a'2)  =  (p\{ai)tp2{a2)tpi{a\)(p2{a'2)  =  tp(a  1  <g>  a2)<p(a[  ®  a2) 
shows  that  tp  respects  multiplication  on  a  set  of  additive  generators  of  A\  ®r  A2. 

12.  Part  (a)  is  immediate  from  Proposition  10.1.  If  \[r  is  a  nonzero  map  in  ME ,  then 
is  a  submodule  of  M  isomorphic  to  E.  Hence  1 {r(E)  C  Me  by  construction, 

and  (b)  follows.  Part  (c)  is  immediate  from  (b). 

13.  With  d  e  De  =  Homy? (£,  E),  we  can  form  1 f/d  —  tjr  o  d  if  ■jr  is  in 
Horn/? (A,  M e ) ,  and  we  can  form  de  —  die)  if  e  is  in  E.  These  definitions  give 
the  required  unital  De  module  structures  for  (a)  and  (b).  The  members  of  D/  = 
Homyjfis,  E)  commute  with  the  left  R  action  on  E  by  definition,  and  this  is  (c). 

14.  In  view  of  (c)  in  the  previous  problem,  the  left  action  of  R  on  E  can  be  regarded 
as  a  right  R°  action  on  E  in  such  a  way  that  it  commutes  with  the  left  De  action  on 
E.  In  other  words,  £  is  a  unital  ( D /. .  R°)  bimodule.  Corollary  10.19b  shows  that 
ME  ®De  E  becomes  a  unital  right  R°  module,  hence  a  unital  left  R  module. 

15.  Define  a  map  b  :  ME  x  E  — »•  M,  additive  in  each  variable,  by  bit//,  e )  = 
For  d  in  De,  this  has  b(i(r  o  d,  e)  —  (i/s  o  d)(e)  —  1 //(d(e))  =  b(x[r,d(e)). 

Hence  b  is  De  bilinear  and  has  an  additive  extension  $  :  ME  ®p>E  E  — >  M  with 
Ofi fr  ®  e)  —  i/r (e). 

The  map  O  is  R  linear  since  0(r(i/r  ®  e))  —  0(i/f  ®  re)  =  1 {r(re)  =  r(i/r(e))  = 
r(0(i/f  ®  e)).  Since  1 ft  is  in  ME ,  ir(e)  is  in  Me',  thus  O  has  image  in  Me- 
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Hints  for  Solutions  of  Problems 


To  see  that  <J>  is  onto  Me,  write  Me  =  Ms  with  each  Ms  simple,  and  fix 

an  isomorphism  oes  e  Hom^(£,  Ms )  for  each  s  e  T .  For  any  element  m  e  Me,  we 
can  find  a  finite  subset  T'  of  T  such  that  m  —  YseT’  ms  with  ms  e  Ms.  If  we  let 
es  =  txfx{ms),  then  <t>(  YseT'  as  0  es )  =  m-  Thus  <t>  maps  onto  Me- 

To  see  that  O  is  one-one,  we  observe  from  Problem  12  and  Lemma  10.3  that 

Me  =Homfl(£',  M)  =  Homs(£,  Me)  =  Homs  (£,  Ms)  =  Hom^(£,  Ms). 

seT  seT 

Each  summand  on  the  right  side  is  isomorphic  to  De-  That  is,  the  collection  of 
isomorphisms  {ots\seT  from  the  previous  paragraph  is  a  basis  of  ME  as  a  right  De 
vector  space.  Consequently  every  element  of  ME  ®de  E  may  be  written  as  a  finite 
sum  Yas  0  es  with  es  e  E.  The  image  of  the  element  Yas  0  es  is  Yasies)-  If 
this  is  0,  then  each  as(es)  is  0  because  of  the  independence  of  the  Ms’s.  Since  as  is 
an  isomorphism,  it  follows  that  es  —  0  for  each  s.  Therefore  Y  « s  0  e*  —  0.  Thus 
<J>  is  one-one. 

16.  The  composition  in  one  order  is 

N  f— >  Hom^(£,  N)  h*  HomsC^,  N)  ®de  E.  (*) 

For  N  =  Me,  the  map  <t>,  when  applied  to  the  composition,  recovers  Me,  since 
Problem  15  says  that  <t>  is  onto.  For  general  N,  we  can  write  Me  —  N  ®  N'.  When 
we  apply  <t>  to  (*)  for  N  and  N'  separately,  we  recover  R  submodules  of  N  and  N', 
respectively.  To  have  a  match  for  all  of  Me,  we  must  recover  all  of  N  and  Nr. 

The  composition  in  the  other  order  is 

W  W  ®e>e  E  i->  Homfl(£',  W  ®de  E).  (**) 

For  IT  =  ME ,  the  image  corresponds  under  the  map  Hom(l,  <t>)  to  Homs  (7; ,  Me)  — 
ME .  For  general  IT,  we  can  write  ME  —  IT  ©  IT'.  When  we  apply  Hom(l,  <t>) 
to  (**)  for  IT,  we  get  an  R  submodule  of  ME  that  contains  IT.  In  fact,  for  any 
if  e  IT,  Horns (£,  E  ®de  E)  contains  the  map  e  h  if  ®  e.  Composing  with  O 
gives  m  w(e).  Thus  the  members  of  IT  are  in  the  image.  Similarly  the  members 
of  IT'  are  in  the  image  for  IT'.  The  direct  sum  of  the  images  must  be  ME ,  and  thus 
the  images  must  be  exactly  IT  and  IT\ 

17.  The  computation 

0  e))  =  <p( tjr{e))  —  (<p  o  f){e))  =  OaTOp  o^r)®e)  =  ®Ni<pE(f)  0  e) 

proves  the  formula  in  the  last  line  of  the  statement  of  the  problem.  For  the  inverse, 
suppose  we  are  given  a  map  r  e  Horn e>e  (ME ,  NE).  Then  r  induces  an  R  linear  map 
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defined  by 


r'E(ip-  <g>  e)  —  r(i jr)  <g>  e. 


Composition  with  the  isomorphism  of  Problem  15  gives  an  R  homomorphism 


XE  —  $  N  oxE°  ®  M  ■  ->■  Ne  ■ 

We  show  that  <p  (->•  tpE  and  r  h*  xe  are  inverses.  If  a  map  q>  in  Hom^fM^,  Ne)  is 
given,  we  are  to  calculate  (spe)e  €  Hom^fM^,  Ne)-  It  is  enough  to  find  the  effect 
of  ( (PE)e  on  elements  <t>M (ijr  ®  e)  with  i Jr  e  ME  and  e  e  E.  For  such  an  element, 

(< pE)E^M{f  ®  e))  —  <&N(((pE)'(ir  ®  e))  =  <t>N((pE(if)  <g>  e) 

—  <pE(f)(e)  =  <p(f(e))  =  <p(<&M(^  ®  e)). 


Thus  (( pe)e  —  <P-  Similarly  for  r  e  Horn de(Me  .  NE ),  we  find  that  (r e)E  =  r. 
Thus  tp  i->  tpE  and  r  h>  tf  are  inverses. 

18.  Let  us  write  M  —  Ms  with  each  Ms  semisimple.  Each  Ms  is  contained 
in  some  Me,  and  hence  M  =  Y1f.c£  M E-  Let  us  see  that  the  sum  is  direct.  If 
Me  has  nonzero  intersection  with  Me,  +  ■  ■  ■  +  Me,,,  where  E  \ .... ,  En  are  simple 
R  modules  with  no  two  isomorphic,  then  there  is  a  nonzero  R  linear  map  from  E 
into  Me,  +  •  ■  •  +  Me,,  ■  We  can  write  each  M /-;(  as  a  sum  of  simple  R  submodules 
isomorphic  to  Ej,  and  Proposition  10.1  shows  that 

Me,  H - F  Me„  =  (J)  M's 

seT 


with  each  M'  isomorphic  to  one  of  E\, ... ,  E„.  If  all  of  E\,  ....  En  are  nonisomor¬ 
phic  with  E,  then  Lemma  10.3  and  Proposition  10.4a  show  that 


Hom^(£,  Me,  H - h  ME„)  =  0, 


contradiction.  We  conclude  that  the  sum  M  —  YLi:g_£  Me  is  direct.  This  proves  the 
equality  at  the  left  in  the  displayed  formula  of  the  problem,  and  the  isomorphism  on 
the  right  in  that  display  follows  from  Problem  15. 

19.  If  N  is  a  left  R  submodule  of  M,  then  Ne  C  Me  for  every  E.  Conversely  the 
previous  problem  shows  that  a  system  of  Ne’s  defines  an  R  submodule  N .  Thus  this 
problem  is  a  restatement  of  Problem  16. 

20.  We  have 


Hom^(M,  N)  =  ]”[  Hom^fM^,  N)  —  ]~[  Honifl(M£,  Ne), 
Ee£  Ee£ 


and  the  rest  follows  from  Problem  17. 
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This  list  indexes  recurring  symbols  introduced  in  Chapters  I  through  X  (pages 
1-591).  For  other  recurring  symbols,  including  set-theoretic  notation  introduced 
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In  the  list  below,  each  piece  of  notation  is  regarded  as  having  a  key  symbol. 
The  first  group  consists  of  those  items  for  which  the  key  symbol  is  a  fixed  Latin 
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next  group  consists  of  those  items  for  which  the  key  symbol  is  a  Greek  letter. 
The  final  group  consists  of  those  items  for  which  the  key  symbol  is  a  variable  or 
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o(n,  v ),  348 
Aadi ,  72 
Ann (U),  52 
Aut  H,  167 
B"(G,  N),  356 
Cm,  126 
C(G,  C),  330 
C(G,  R),  381 
c(A),  395 
Cn(G,  N),  356 
Cl(x),  165 
Cliff (£,  (••)).  302 
D,  511,  532 
Dn,  122 
deg,  150,  156 
det,  67,  215 
dim  V,  37 
EndK(F),  372 
End/j(M),  554 
C  1  '  -  •  •  ,  G?  ,  36 
G,  ft,  g,  527 
F,  9,  34,  158 
F4,  143 


F  p,  142,  148 
F,,  461 
F(S),  307,  377 
ns),  159 
Gal(K/k),  474 
GCD,  2,  394 
GL(  V),  122 
GL(n,F),  122 
H,  128 
Hg,  128 
H(V ),  302 
Hn(G,N),  356 
Hom(i/r,  (p),  568 
HomF(G,  V),  43,  44 
HomicfG,  V),  266 
Horn* (M,  AT),  554 
i,  j,  k,  128 
K,  k,  K/k,  453 
kerL,  46 
ker  (p,  131 
/,  332 
LCM,  32 
lrad,  250 
Mn(  R  ),  215 
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Mkn (F),  25  Z[>/=T],  392 

Mmn(R),  376  Z"(G,  TV),  356 

Morph(A,  5),  189  ZG,  373 


TVic/k (fl)>  519 
N(H),  188 
O,  304 
(9,  343 

O(V),  O (n),  122 
Obj(C),  189 
C° pp,  191,  210 
Pfaff(X),  299,  449 
PSL(2,Z),  366 
PSL(2,  Z/mZ),  366 
PSL(/i,F),  205 
<Q>[0],  122,  143 
r,  332 
rrad,  250 
SBJ  121 
S(£),  284 
Sn(E),  284 
5'  =  5  U  S-1,  307 
SL(V),  122 
SL(n,F),  122 
SO(V),  SO(n),  122 
SU(V),  SU(n),  122 
sgn,  17 
span{i;a},  35 
T(E),  281 
Tn(E),  281 
Tr  A,  74 
TrK/k(o),  519 
C/(0),  301 
U(V),  U (n),  122 
IV (S'),  307 
W(V),  302 
wt(c),  206 
ZG,  165 
ZG(x),  165 
Z/777Z,  120 
Z/(w),  120 


Greek 

r,  a,  44 
(0. 45 
(Tr)-45 

5„,  356 
5(C),  206 
i  :  V  -*  V",  54 
£,  48 
XR,  339 
V>,  7 
<px,  454 
<D„(X),  490 

Operations  on  sets  given 
by  superscripts 

V',  50 
G',  313 
M-1,  96 
G\  251 
L*,  100 
A*,  101 
V,  115 
G,  329 
A',  41 
L',  53 

M°,  R°,  555 
€x,  Qx,  Mx,  Zx,  120 
(Z/  777  Z) x ,  142 
tfx,  143 
P~\  439 

Specific  functions 

(•,  •).  90 
II  •  II,  91 
[•,  •],  301 
(•,•),  249 
[K  :  k],  456 
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Isolated  symbols 

=,  48,  119,  144 
=  120 
1,  118 
{1},  118 
1a,  190 

Operations  on  sets  and  classes 

G/H,  G\H,  130 
gH ,  Hg,  129 
G i  x  G2,  126 
G  x  T  H,  169 
G]  *  G2,  324 
Gp,  163 

G  =  (5;  £>,  314 
RG ,  380 
F[X],  9 
£[X],  149 

155 

k[xi,  . .  .,xn],  454 
k(jci, . . . ,  x„),  454 
K(X),  384 
Cs,  196 
V/G,  55 
M/N,  378 
/  +  7,  405 
//,  405,  435 
G®  V,  59 


E  <g>K  F,  265 
e  ®  f,  265 
M  ®R  N,  574 
m  <g>  n,  574 
575 

A(£),  291 
A"(£),  291 
£c,  274 
£\  275 

©,eS ,  62,  138,  376 
FUs ,  62,  136,  198,  376 
LLs-  199 
>fGes ,  323 

KH,  474 
S~lR,  428 
Rs,  428 
£/,,  430 
GP,  534 

Miscellaneous 

/  12345\  t  .. 

I  43 5  1  2  I  ,  permutation,  15 

(5  2  3),  cycle,  16 
/i  *  A,  convolution,  339 
(a),  principal  ideal,  390 
an ),  ideal,  390 
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Abel,  494 
abelian  group,  1 1 9 

direct  sum  for,  138,  139 
finitely  generated,  176 
free,  176 

tensor  product  for,  578 
absolute  value,  604 
addition  in  abelian  group,  119 
addition  in  ring,  141 
addition  in  vector  space,  34 
addition  of  cardinal  numbers,  613 
addition  of  matrices,  25 
additive  extension,  574 
additive  functor,  585 
additive  in  a  variable,  574 
adjoin,  454 
adjoint,  100,  101 
classical,  72 
algebra,  280 
alternative,  304 
associative,  280,  372 
associative  R,  380 
Clifford,  302 
division,  373 
exterior,  291 
filtered  associative,  301 
graded  associative,  301 
group,  380,  445 
Heisenberg  Lie,  302 
Jordan,  303 
Lie,  281,301 
polynomial,  289 
symmetric,  284 
tensor,  282 

tensor  product  for,  582 
universal  enveloping,  30 1 
Weyl,  302 

algebraic  closure,  465 
existence,  466 


uniqueness,  467-468 
algebraic  curve,  411 
algebraic  element,  454 
algebraic  extension,  456 
finite  456 
simple  457 

algebraic  integer,  342,  41 1,  421,  515 
algebraic  number,  123,  387,  457.  465,  515 
algebraic  number  field,  123,  373,  387,  457 
algebraically  closed,  464 
algebraically  closed  field,  212 
alternating,  67 

alternating  bilinear  form,  253 
alternating  group,  121,  171 
alternating  matrix,  257 
alternative  algebra,  304 
annihilator,  52,  85 
antisymmetrized  tensor,  294 
antisymmetrizer,  294 
area,  86 

Artin-Schreier  Theorem,  550,  552 
ascending  chain  condition,  421,  565 
associate,  393 
associated  graded  map,  300 
associated  graded  vector  space,  300 
associated  primitive  polynomial,  396 
associative  algebra,  280,  372 
filtered,  301 
graded,  301 
tensor  product  for,  582 
associative  law,  25,  34,  82,  118,  141 
associative  R  algebra,  380 
associativity  formula,  580,  581 
associator,  304 
automorphism,  453 
inner,  201 
of  group,  167 
of  number  field,  1 24 
Axiom  of  Choice,  597 
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Baer  multiplication,  355,  361 
basis,  36,  176 
dual,  51 
free,  312 
standard,  36 
standard  ordered,  48 
vector  space,  36 
Weyl,  296 
BCH  code,  548 
Bessel's  inequality,  94 
Bezout’s  identity,  3 
bilinear,  90 
bilinear  form,  249 
alternating,  253 
invariant,  260 
nondegenerate,  251 
skew-symmetric,  253 
symmetric,  253 
bilinear  function,  263,  574 
bilinear  map,  263 
bilinear  mapping,  263 
bimodule,  573 
block,  232 

block  multiplication,  86 
Bolzano- Weierstrass  Theorem,  603 
boundary  map,  583 
Burnside’s  Theorem,  345 

cancellation  law,  118 
canonical  form,  212 
Jordan,  232,  409 
of  rectangular  matrix,  242 
rational,  245,  447,  448 
canonical  map  into  double  dual,  54 
canonical-form  problem,  214 
Cantor,  612 

Cardan's  formula,  492,  510,  513 
Cardano,  493 
cardinal  number,  610 
addition  of,  613 
cardinality,  610 
Cartan  matrix,  86 
Cartesian  product,  595 
indexed,  597 
category,  53,  135,  189 
opposite,  191,  210 

Cauchy’s  Theorem  in  group  theory,  185 


Cayley  number,  304 
Cayley-Dickson  construction,  304 
Cayley-Hamilton  Theorem,  221 
Cayley’s  Theorem,  125 
center,  372,  380,  554 
of  group,  165 
centralizer  of  element,  1 65 
chain,  583,  605 
chain  condition 

ascending,  417,  565 
descending,  565 
change  of  rings,  573,  578 
character,  339 
multiplicative,  329 
characteristic  of  a  field,  148 
characteristic  polynomial,  74,  218 
characteristic  subgroup,  360 
check  matrix,  548 

Chinese  Remainder  Theorem,  6,  405 
class,  594,  595 
equivalence,  600 
class  equation  of  group,  187 
class  function,  340 
classical  adjoint,  72 
Clifford  algebra,  302 
closed,  583 
closed  form,  584 
coboundary,  356 
coboundary  map,  356 
cochain,  356 
cocycle,  356 
code,  207 
BCH,  548 
cyclic,  547 

cyclic  redundancy,  209 
dual,  363 

error-correcting,  206,  363,  547 
Hamming,  207 
linear,  207 
parity-check,  207 
repetition,  207 
self-dual  linear,  363 
codomain,  596 
coefficient,  9,  149 
Fourier,  330,  362 
leading,  150 
matrix,  336 
cofactor,  70,  217 
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cohomology  group,  356 
cohomology  of  groups,  355,  584 
collection,  595 
column  space,  38 
column  vector,  25 
in  an  ordered  basis,  45 
common  multiple,  32 
commutative  diagram,  194 
commutative  law,  25,  34,  83,  119 
commutative  ring,  141 
commutator,  360 
commutator  subgroup,  313 
complement,  595 
completely  reducible,  555 
complex,  583,  585 

complex  conjugate  of  vector  space,  115 
complex  conjugation,  604 
complex  number,  604 
complexification,  274 
composition,  598 
composition  factor,  173,  561 
composition  series,  172,  560 
congruent  modulo,  1 20 
conjugacy  class,  165 
conjugate,  165 
conjugate  linear,  90 
conjugates  of  an  element,  523 
conjugation,  complex,  604 
consecutive  quotient,  172,  560 
constant  polynomial,  10,  150,  155 
constructible  coordinates,  470 
field  of,  470-471 

constructible  regular  polygon,  473,  489, 
contraction  of  ideal,  432 
contragredient,  53 
matrix  of,  53 

contragredient  representation,  365 
contravariant  functor,  193 
convolution,  339,  372,  381 
coproduct  functor,  199,  376,  589 
in  a  category,  1 98 
corner  variable,  2 1 
correspondence,  one-one,  598 
coset 
left,  129 
right  129 
countable,  xx 
counting  formula,  164 


covariant  functor,  192 
Cramer’s  rule,  24,  72,  217 
CRC-8,  209 

crossed  homomorphism,  357 
cubic  polynomial,  542 
cubic  resolvent,  545 
cut,  602 
cycle,  15 

cycle  structure,  166 
cycles,  disjoint,  16 
cyclic 
code,  547 
group,  125 
R  module,  401 
redundancy  code,  209 
subspace,  244 
vector,  244 

cyclotomic  field,  490,  500 
cyclotomic  polynomial,  399,  490,  540 

dal  Ferro,  493 
de  Rham  cohomology,  584 
decomposition  group,  534 
Dedekind  domain,  416,  437,  450,  525 
degree,  10, 150,  154, 456 
dependent,  integrally,  421 
derivative  of  polynomial,  46 1 
descend  to,  57,  133,  147,  375 
descending  chain  condition,  565 
determinant,  65,  86,  215 
Gram,  114 
of  linear  map,  66 
of  matrix,  66 
of  square  matrix,  67 
properties  of,  68,  216 
Vandermonde,  71,  217 
diagonal  entry,  24,  180,  447 
diagonal  matrix,  24,  447 
diagram,  194 

commutative,  194 
square,  194 
difference,  595 
difference  product,  511 
differential  equations,  system,  246 
differential  form,  584 
differentiation,  461 
dihedral  group,  121,  170,  316 
dimension,  564 
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of  vector  space,  37,  78 
direct  image,  599 
direct  product 

of  groups,  126,  127,  136,  137 
of  R  modules,  376 
of  rings,  374 
of  vector  spaces,  62,  63 
direct  sum 

of  abelian  groups,  138,  139 
of  R  modules,  376 
of  vector  spaces,  59,  60,  61,  62,  64 
Dirichlet’s  theorem  on  primes  in  arithmetic 
progressions,  330,  367 
discriminant,  511,  532,  533 
disjoint  cycles,  16 
disjoint  union,  198 
distributive  law,  26,  34,  141 
divide,  1,  10,388,438 
division  algebra,  373 
division  algorithm,  2,  1 1 
division  ring,  144,  373 
divisor,  1 

elementary,  179,  447 
greatest  common,  2,  8,  12,  393 
zero,  144 
Dixmier,  559 
domain,  596 

Dedekind,  416,  437,  450,  525 
Euclidean,  392,  444,  446 
integral,  144 
principal  ideal,  390,  442 
unique  factorization,  389 
dot  product,  90 
double  a  cube,  469,  471 
double  dual,  54 
dual 

double,  54 
of  vector  space,  50 
dual  basis,  5 1 
dual  code,  363 

duality  in  category  theory,  210 

eigenspace  of  linear  function,  76 
eigenspace  of  matrix,  73 
eigenvalue  of  linear  function,  76 
eigenvalue  of  matrix,  73 
eigenvector  of  linear  function,  76 
eigenvector  of  matrix,  73 


Eisenstein's  irreducibility  criterion,  398 
element,  593,  594 
elementary  divisor,  179,  447 
elementary  matrix,  28 
elementary  row  operation,  20 
elementary  symmetric  polynomial,  448 
entity,  593 
entry,  20,  24 

diagonal,  24,  180,  447 
enveloping  algebra,  universal,  30 1 
equality  of  matrices,  24 
equation,  linear,  23 
equivalence  class,  600 
equivalence  relation,  599-600 
equivalent 
factor  set,  352 
finite  filtrations,  561 
group  extensions,  352 
normal  series,  174 
words,  307 

equivariant  mapping,  191 
error-correcting  code,  206,  363,  547 
Euclid's  Lemma,  5 
Euclidean  algorithm,  2,13 
Euclidean  domain,  392,  444,  446 
Euler  <p  function,  7 
evaluate,  10 
evaluation,  151,  157 
even  permutation,  121 
exact,  583,  584 
exact  form,  584 
exact  sequence,  584,  585 
short,  585 
split,  588 
expansion 

homogeneous-polynomial,  155 
in  cofactors,  70,  217 
monomial,  155 

expressible  in  terms  of  k  and  radicals,  495 
extension 
additive,  574 
algebraic,  456 
field,  453 
finite,  456 
finite  algebraic,  456 
finite  Galois,  485 
group,  348 
linear,  44,  264 
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normal,  481 
of  ideal,  432 
of  scalars,  275,  573,578 
separable,  476 
simple  algebraic,  457 
exterior  algebra,  29 1 
external  direct  product 
of  groups,  126,  136 
of  R  modules,  376 
external  direct  sum 
of  abelian  groups,  138 
of  R  modules,  376 
of  vector  spaces,  59,61 
external  semidirect  product  of  groups,  169 

factor,  1,  10,  136,388,  438 
factor  group,  132 
factor  ring,  146 
factor  set,  348 
Factor  Theorem,  1 1 
factor  through,  57,  133,  147,  378 
factorization,  1,  10 
nontrivial,  2,  10 
prime,  5 
unique,  5,  14 
family,  595 

fast  Fourier  transform,  33 1 ,  364 
Fermat  number,  472 
Fermat  prime,  472 
Fermat's  Little  Theorem,  142 
Ferrari,  493 
field,  142 

algebraically  closed,  212,  464 
characteristic  of,  148 
cyclotomic,  490,  500 
extension,  453 

finite,  143,  153,  159,  373,  461, 488 
fixed,  474 
formally  real,  550 
Galois,  461 

number,  123,373,387,457 
obtained  by  adjoining,  454 
of  constructible  coordinates,  470-471 
of  fractions,  383,  601 
ordered,  550 
prime,  148 

quadratic  number,  422,  543 
real  closed,  550 


splitting,  458 
field  isomorphism,  453 
field  map,  453 
field  mapping,  453 
field  polynomial,  519 
filtered  associative  algebra,  301 
filtered  vector  space,  300 
filtration,  finite,  560 
finite 

algebraic  extension,  456 
basis  condition,  417,  565 
extension,  456 

field,  143,  153,  159,  373,  461,  488 
filtration,  560 
Galois  extension,  485 
length,  563 

linear  combination,  35 
order,  130 
rank,  178 

rank  of  free  R  module,  401 
support,  381 

finite-dimensional  vector  space,  37 
finitely  generated  abelian  group,  176 
fundamental  theorem  for,  179 
finitely  generated  group,  315 
finitely  generated  R  module,  400 
finitely  presented  group,  315 
First  Isomorphism  Theorem,  57,  133,  379 
Fitting’s  Lemma,  588 
fixed  field,  474 
forgetful  functor,  192 
form,  263 

bilinear,  see  bilinear  form 
Hermitian,  258 

sesquilinear,  see  sesquilinear  form 
skew-Hermitian,  258 
formally  real  field,  550 
Fourier  coefficient,  330,  362 
Fourier  inversion  formula 
for  class  functions,  341 
for  finite  abelian  group,  330 
for  finite  group,  338 
Fourier  inversion  problem,  330 
Fourier  series  330 
fractional  ideal,  450 

unique  factorization  of,  45 1 
fractions 

field  of,  383,  601 
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partial,  444 

free  abelian  group,  176 
free  basis,  312 
free  group,  308 
rank,  314 

free  product,  199,  323 
free  R  module,  377 
free  subset,  3 1 2 
Frobenius  map,  462 
function,  595 
bilinear,  263 
class,  340 
k-linear,  263 
k-multilinear,  263 
linear,  42,  44 
multilinear,  263 
polynomial,  153,  158 
functional,  linear,  50 
functional,  multilinear,  66 
functor,  53,  135 
additive,  585 
contravariant,  193 
coproduct,  199,  376,  589 
covariant,  192 
forgetful,  192 
product,  196,  376 
Fundamental  Theorem 
of  Algebra,  14,  465,492 
of  Arithmetic,  5 

of  Finitely  Generated  Abelian  Groups,  179 
of  Finitely  Generated  Modules,  402,  447 
of  Galois  Theory,  345,490 

Galois,  494 

Galois  extension,  finite,  485 
Galois  field,  461 
Galois  group,  474 
Galois  theory,  123,  484 
Gauss,  473, 489,  500 
Gauss's  Lemma,  395 
Gaussian  integer,  392,  446 
general  linear  group,  122 
generated  by,  125 
generated  submodule,  377-378 
generating  polynomial,  209,  547 
generator,  125,  176,  399 
monic,  244 
generators,  314 


graded  associative  algebra,  30 1 
graded  vector  space,  300 
Gram  determinant,  114 
Gram  matrix,  1 14 

Gram-Schmidt  orthogonalization  process,  95 
greatest  common  divisor,  2,  8,  12,  393 
greatest  lower  bound,  603 
group,  118 
abelian,  119 
alternating,  121,  171 
automorphism  of,  167 
center  of,  165 
cohomology,  356 
cyclic,  125 
decomposition,  534 
dihedral,  121,  170,  316 
direct  product  for,  126,  127,  136,  137 
finitely  generated,  315 
finitely  presented,  315 
free,  308 
free  abelian,  176 
free  product  for,  323 
Galois,  474 
general  linear,  122 
homomorphism  of,  131 
icosahedral,  368 
octahedral,  368 
of  units,  143 
order  of,  129 
orthogonal,  122 
quaternion,  128 
quotient  of,  132 
rotation,  122 

semidirect  product  for,  1 69 
simple,  171 
solvable,  494 
special  linear,  122 
special  unitary,  1 22 
symmetric,  121 
tetrahedral,  368 
trivial,  118 
unitary,  122 
group  action,  124,  159 
transitive,  163 
trivial,  161 

group  algebra,  380,  445 
group  extension,  348 
group  ring,  integral,  373 
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Hamming  code,  207 
Hamming  distance,  206 
Hamming  space,  206 
harmonic  analysis,  506 
harmonic  polynomial,  116 
Heisenberg  Lie  algebra,  302 
heptadecagon,  503 
Hermite,  515 
Hermitian,  101 
Hermitian  form,  258 
Hermitian  matrix,  259 
Hermitian  sesquilinear  form,  258 
Hermitian  symmetric,  90 
Hilbert  Basis  Theorem,  416,  418 
Hilbert-Schmidt  norm,  112 
homogeneous  element,  281 
homogeneous  ideal,  284 
homogeneous  polynomial,  116,  155 
homogeneous  system,  23 
homogeneous-polynomial  expansion,  155 
homomorphism 
crossed,  357 
of  groups,  131 
of  R  modules,  375 
of  rings,  144 
substitution,  151,  156 

icosahedral  group,  368 
ideal,  145 

contraction  of,  432 
extension  of,  432 
fractional,  450 
left,  378 
maximal,  385 
prime,  384 
principal,  390 
right,  378 
two-sided,  145 
unique  factorization  of,  438 
identity  element,  118 
identity  in  a  ring,  142 
identity  matrix,  27 
identity  morphism,  190 
image,  596 
direct,  599 
inverse,  599 
of  homomorphism,  131 


imaginary  part,  604 
independent  variable,  2 1 
indeterminate,  9,  149,  154,  155 
index  of  subgroup,  164 
indexed  Cartesian  product,  597 
indexed  intersection,  597 
indexed  union,  597 
infimum,  603 
infinite  order,  130 

infinite-dimensional  vector  space,  78 

inhomogeneous  system,  23 

injection,  59,  62 

inner  automorphism,  201 

inner  product,  90 

inner-product  space,  90 

integer,  algebraic,  342,  411,  421,  515 

integer,  Gaussian,  392,  446 

integers  modulo,  1 20 

integral,  421 

integral  closure,  416,  421 
integral  domain,  144 
integral  group  ring,  373 
integrally  closed,  425 
integrally  dependent,  42 1 
Intermediate  Value  Theorem,  603 
internal  direct  product 
of  groups,  127,  137 
of  R  modules,  376 
of  vector  spaces,  63 
internal  direct  sum 
of  abelian  groups,  139 
of  R  modules,  377 
of  vector  spaces,  60,  61,  64 
internal  semidirect  product  of  groups,  169 
intersection,  595 
indexed,  597 

intertwining  operator,  333 
invariant 

leave  a  bilinear  form,  260 
of  group  action,  357 
invariant  subspace,  73,  333 
invariant  vector  subspace,  218 
inverse,  192 

multiplicative,  143 
inverse  element,  118 
inverse  function,  598 
inverse  image,  599 
inverse  matrix,  27 
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invertible  matrix,  27 
involution,  242 
irreducible  element,  388 
irreducible  left  R  module,  555 
irreducible  representation,  333 
isometry,  159 

isomorphic,  48,  119,  144,  164,  192,  352, 
378 

isomorphism,  48,  119,  144,  192,  378,453 
natural,  268 
isotropic  subspace,  296 
isotropy  subgroup,  163 
isotypic  submodule,  589 
Iwasawa  decomposition,  113 

Jacobi  identity,  301 

Jordan  algebra,  303 

Jordan  block,  231,  409 

Jordan  canonical  form,  232,  409 

Jordan  form,  231 

Jordan  normal  form,  231 

Jordan-Chevalley  decomposition,  243,  549 

Jordan-Holder  Theorem,  176,  562 

k  automorphism,  453 
k  isomorphism,  453 
^-linear,  66 
function,  263 
map,  263 
mapping,  263 
^-multilinear 
function,  263 
map,  263 
mapping,  263 

kernel  of  homomorphism,  131 
kernel  of  linear  map,  46 
Kronecker  delta,  xx,  27 
Kronecker  product,  297 

Lagrange  resolvents,  506 
Lagrange’s  Theorem,  130 
law  of  composition,  1 90 
law  of  cosines,  91 

law  of  quadratic  reciprocity,  499,  544 

leading  coefficient,  1 50 

leading  term,  150 

least  common  multiple,  32 

least  upper  bound,  603,  606 


leave  a  bilinear  form  invariant,  260 

left  coset,  129 

left  ideal,  378 

left  R  module,  374 

left  radical,  250 

left  regular  representation,  332,  338,  365 

left  vector  space,  556 

left-coset  space,  130 

Legendre  polynomial,  114 

length  of  module,  563 

length  of  word,  307 

letter,  121 

Lie  algebra,  281,  301 
Heisenberg,  302 
Lie  bracket,  301 
Lindemann,  515 
linear,  42,  44 
linear  code,  207 
self-dual,  363 
linear  combination,  35 
linear  equation,  23 
linear  extension,  44,  264 
linear  fractional  transformation,  160 
linear  function,  see  linear  map 
linear  functional,  50 
linear  map,  42,  44 
determinant  of,  66 
eigenspace  of,  76 
eigenvalue  of,  76 
eigenvector  of,  76 
kernel  of,  46 
normal,  110 
orthogonal,  103 
positive  definite,  107 
positive  semidefinite,  107 
unitary,  103 

linear  mapping,  see  linear  map 
linear  operator,  42 

linear  transformation,  see  linear  map 
linearly  independent  set,  36,  176 
local  ring,  434 
localization,  416 

of  R  at  the  prime  P,  430 
of  R  with  respect  to  S,  429 
lower  bound,  603 

Mac  Williams  identity,  364 
map,  596 
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bilinear,  263 
coboundary,  356 
field,  453 
^-linear,  263 
^-multilinear,  263 
linear,  42,  44 
multilinear,  263 
mapping,  see  map 
matrix,  24 

addition  for,  25 
alternating,  257 
Cartan,  86 
check,  548 
coefficient,  336 
column  space  of,  38 
determinant  of,  66,  67 
diagonal,  24,  447 
eigenspace  of,  73 
eigenvalue  of,  73 
eigenvector  of,  73 
elementary,  28 
equality  for,  24 
Gram,  1 14 
Hermitian,  259 
identity,  27 
inverse,  27 
invertible,  27 
multiplication  for,  26 
nilpotent,  232 
nonsingular,  212,  217 
null  space  of,  38 

of  a  linear  map  in  two  ordered  bases,  45 

orthogonal,  103 

positive  definite,  107 

positive  semidefinite,  107 

rank  of,  4 1 

row  space  of,  38 

scalar  multiplication  for,  25 

singular,  212,  217 

skew-symmetric,  257 

square,  24 

symmetric,  253 

symplectic,  450 

trace  of,  74 

transpose  of,  41 

unitary,  103 

Vandermonde,  71,  217 

zero,  25 


matrix  representation,  332 
matrix  ring,  37 1 
maximal  element,  605 
maximal  ideal,  385 
maximum  condition,  417,  565 
member,  595 
minimal  distance,  207 
minimal  polynomial,  221,  223,  455 
minimum  condition,  565 
module 
cyclic,  40 1 

direct  product  for,  376 
direct  sum  for,  376 
finitely  generated,  400 
free  R,  377 

homomorphism  of,  375 
irreducible,  555 
left  R,  374 
of  finite  rank,  40 1 
quotient,  378 
rank  of,  402 
right  R ,  375 
semisimple,  555 
simple,  555 
tensor  product  for,  574 
modulo,  120 
monic  generator,  244 
monic  polynomial,  1 50 
monomial,  155 
monomial  expansion,  155 
morphism,  189 
identity,  190 
multilinear  form 
symmetric,  283 
function,  263 
functional,  66 
map,  263 
mapping,  263 
multiple,  1,  10 
least  common,  32 
multiplication 
Baer,  355,  361 
in  a  group,  118 
in  a  ring,  141 
in  an  algebra,  280 
of  matrices,  26 
multiplicative  character,  329 
multiplicative  inverse,  1 43 
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multiplicative  system,  428 
multiplicity  of  a  root,  14 

/?-fold  tensor  product,  280 
Nakayama's  Lemma,  436 
natural  isomorphism,  268 
natural  transformation,  268 
negative,  xx,  119 
nicely  normed,  305 
Nielsen-Schreier  Theorem,  318 
nilpotent,  549 
element,  443 
matrix,  232 
Noetherian  ring,  418 
nondegenerate  bilinear  form,  25 1 
nonsingular,  212,  217 
nontrivial  factorization,  2,  10 
norm,  91,  519,  544 
Hilbert-Schmidt,  112 
normal  extension,  48 1 
normal  linear  map,  1 10 
normal  series,  equivalent,  174 
normal  series  of  groups,  172 
normal  subgroup,  131 
normalizer  of  subgroup,  188 
null  space,  38 
Nullstellensatz,  412 
number 

algebraic,  123,  387,  457,  465,  515 
complex,  604 
rational,  601 
real,  602 

number  field,  123,  373,  387,  457 
automorphism  of,  1 24 
quadratic,  422,  543 

object,  189 
octahedral  group,  368 
octonion,  304 
odd  permutation,  121 
one-one,  598 

one-one  correspondence,  598 
onto,  598 

operation,  elementary  row,  20 
operator 

intertwining,  333 
linear,  42 
projection,  226 


opposite  category,  191,  210 
opposite  ring,  555 
orbit,  163 
order 

finite,  130 
infinite,  130 
of  group,  129 
ordered  field,  550 
ordered  pair,  595 
ordering 
partial,  605 
simple,  286,  605 
total,  605 
well,  605 

ordinary  differential  equations,  system,  246 

orthogonal  complement,  97 

orthogonal  group,  122,  262 

orthogonal  linear  map,  103 

orthogonal  matrix,  103 

orthogonal  projection,  97 

orthogonal  set,  93 

orthogonal  vectors,  93 

orthonormal  basis,  93 

orthonormal  set,  93 

pair 

ordered,  595 
unordered,  595 
parallelogram  law,  9 1 
parity-check  code,  207 
Parseval’s  equality,  98 
partial  fractions,  444 
partial  ordering,  605 
pentagon,  50 1 

period  of  cyclotomic  field,  500 
permanence  of  identities,  215 
permutation,  15,  121 
even,  121 
odd,  121 

Pfaffian,  299,  449 
Plancherel  formula,  338 
Poincare-Birkhoff-Witt  Theorem,  301 
point,  595 

Poisson  summation  formula,  362 
polar  decomposition,  1 1 1 
polarization,  92 
polynomial,  9,  149,  154 
associated  primitive,  396 
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characteristic,  74,  218 
constant,  10,  150,  155 
cubic,  542 

cyclotomic,  399,  490,  540 
elementary  symmetric,  448 
field,  519 

generating,  209,  547 
harmonic,  116 
homogeneous,  116,  155 
Legendre,  114 
minimal,  221,  223,  455 
monic,  150 
primitive,  394 
quartic,  541,  546 
separable,  476 
split,  458 

symmetric,  448,  544 
weight  enumerator,  209 
zero,  10,  150 
polynomial  algebra,  289 
polynomial  function,  153,  158 
polynomial  ring,  371 
positive,  xx 

positive  definite  linear  map,  107 
positive  definite  matrix,  107 
positive  semidefinite  linear  map,  107 
positive  semidefinite  matrix,  107 
power,  125 
presentation,  314 
primary  block,  232 
primary  decomposition,  229 
Primary  Decomposition  Theorem,  229 
primary  subspace,  229 
prime,  2,  10 
relatively,  6 
prime  element,  389 
prime  factorization,  5 
prime  field,  148 
prime  ideal,  384 
primitive  element,  480 
primitive  polynomial,  394 
associated,  396 
primitive  root,  490 
Principal  Axis  Theorem,  254 
principal  ideal,  390 
principal  ideal  domain,  390,  442 
product 

Cartesian,  595 


difference,  51 1 
dot,  90 

free,  199,  323 
functor,  198,  376 
in  a  category,  196 
in  a  group,  118 
in  an  algebra,  280 
indexed  Cartesian,  597 
inner,  90 
Kronecker,  297 
n-fold  tensor,  280 
of  matrices,  26 
of  permutations,  15 
set-theoretic,  595 
tensor,  263 
triple  tensor,  277 
vector,  281 

projection,  59,  62,  226 
orthogonal,  97 
Projection  Theorem,  96 
proper  subset,  595 
properly  contained,  595 
pure  tensor,  265 
Pythagorean  Theorem,  9 1 

quadratic  number  field,  422,  543 
quadratic  reciprocity,  499,  544 
quartic  polynomial,  541,  546 
quaternion,  128 
quaternion  group,  128 
quotient 
group,  132 

homomorphism,  132,  146 
map,  55 
module,  378 
ring,  146,  374 
space,  55,  130 

R  homomorphism,  375 
R  module,  375 
R  submodule,  377 
radical,  250,  253,  257, 495 
ramification  index,  527,  543 
range,  596 
rank 

of  free  abelian  group,  178 
of  free  group,  314 
of  free  R  module,  402 
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of  matrix,  4 1 

rational  canonical  form,  245,  447,  448 

rational  number,  60 1 

real  closed  field,  550 

real  number,  602 

real  part,  604 

reduced  row-echelon  form,  20 
reduced  word,  325 
reducible  element,  389 
refinement,  174,  561 
reflexive,  600,  605 
regular 

17-gon,  503 
heptadecagon,  503 
pentagon,  501 
polygon,  473, 489,  499 
representation,  332,  337,  338,  365 
relation,  314,  595 
equivalence,  599-600 
function  as,  595 
partial  ordering  as,  605 
relatively  prime,  6 
repetition  code,  207 
representation,  161 
contragredient,  365 
irreducible,  333 
left  regular,  332,  338,  365 
matrix,  332 

right  regular,  332,  337,  338 
unitary,  332 

residue  class  degree,  527,  543 

restriction,  598 

restriction  of  scalars,  277 

Riemann  sphere,  160 

Riesz  Representation  Theorem,  99 

right  coset,  129 

right  ideal,  378 

right  R  module,  375 

right  radical,  250 

right  regular  representation,  332,  337,  338 
rigid  motion,  159 
ring,  141 

commutative,  141 
direct  product  for,  374 
division,  144,  373 
group,  373 

homomorphism  of,  144 
local,  434 


matrix,  371 
Noetherian,  418 
opposite,  555 
polynomial,  37 1 
quotient  of,  146 
with  identity,  142 
zero,  142 

Rodrigues's  formula,  1 14 
root,  10,  152 

multiplicity  of,  14 
primitive,  490 
tower,  495 
rotation,  43 
rotation  group,  122 
row  operation,  elementary,  20 
row  reduction,  21 
row  space,  38 
row  vector,  25 
row-echelon  form,  20 
Russell's  paradox,  593 

5-tuple,  196 
scalar,  9,  19,  34,  89,  21 1 
scalar  multiplication 
in  vector  space,  34 
of  matrices,  25 

scalars,  extension  of,  275,  573,  578 
scalars,  restriction  of,  277 
Schreier,  175,  348,  562 
Schreier  set,  319 

Schroeder-Bernstein  Theorem,  79,  610 
Schur  orthogonality,  335 
Schur's  Lemma,  333,  559 
Schwarz  inequality,  92 
Second  Isomorphism  Theorem,  58,  135,  379 
self-adjoint,  101 
self-dual  linear  code,  363 
semidirect  product  of  groups,  169 
semisimple,  549 
semisimple  left  R  module,  555 
separable  element,  476 
separable  extension,  476 
separable  polynomial,  476 
sesquilinear,  90 
sesquilinear  form,  258 
Hermitian,  258 
skew-Hermitian,  258 
set,  593,  594 
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set  theory,  von  Neumann,  594 
set  theory,  Zermelo-Fraenkel,  593 
set-theoretic  product,  595 
short  exact  sequence,  585 
sign  of  permutation,  1 7 
signature,  255,  260 
significant  factor,  321 
similar  matrices,  48,  213 
simple  algebraic  extension,  457 
existence,  457 
uniqueness,  458 
simple  group,  171 
simple  left  R  module,  555 
simple  ordering,  286,  605 
simplicial  complex,  583 
simplicial  homology,  583 
simply  transitive  group  action,  163 
singleton,  595 
singular,  212,  217 
size,  24 

skew-Hermitian  form,  258 
skew-Hermitian  sesquilinear  form,  258 
skew-symmetric  bilinear  form,  253 
skew-symmetric  matrix,  257 
socle,  589 
solvable  group,  494 
span,  35,  36 
spanning  set,  36 
special  linear  group,  122 
special  unitary  group,  122 
Spectral  Theorem,  105 
split  exact  sequence,  588 
split  polynomial,  458 
splitting  field,  458 
existence,  458 
uniqueness,  459 
square  a  circle,  469,  472 
square  diagram,  194 
square  matrix,  24 
stabilizer,  163 
stable  subspace,  73 
standard  basis,  36 
standard  ordered  basis,  48 
Steinitz,  466 

straightedge  and  compass,  468 
subcategory,  190 
subfield,  144 
subgroup,  119 


characteristic,  360 
commutator,  313 
index  of,  1 64 
isotropy,  163 
normal,  131 
normalizer  of,  188 
submodule,  377 
generated,  377-378 
isotypic,  589 
subring,  144 
subset,  595 
subspace,  35 
cyclic,  244 
invariant,  73,  333 
isotropic,  296 
primary,  229 
stable,  73 

substitution  homomorphism,  151,  156 
sum  of  two  cardinal  numbers,  613 
sum  of  vector  subspaces,  58 
superset,  595 
support,  finite,  381 
supremum,  603 
Sylow  p-subgroup,  185 
Sylow  Theorems,  185 
Sylvester’s  Law,  255,  260 
symmetric,  90,  101,  600 
Hermitian,  90 
symmetric  algebra,  284 
symmetric  bilinear  form,  253 
symmetric  group,  121,  159 
symmetric  matrix,  253 
symmetric  multilinear  form,  283 
symmetric  polynomial,  448,  544 
elementary,  448 
symmetrized  tensor,  290 
symmetrizer,  290 
symplectic  group,  262 
symplectic  matrix,  450 
system  of  linear  equations,  23 
system  of  ordinary  differential  equations,  246 

Tartaglia,  493 
tensor  algebra,  282 
tensor  product,  263 
n-fold,  280 
of  abelian  groups,  578 
of  modules,  574 
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of  R  algebras,  582 
triple,  277 

tetrahedral  group,  368 

Theorem  of  the  Primitive  Element,  123,  457, 
480,  524 

total  ordering,  605 
trace,  519,  544 
of  matrix,  74 

transcendental  element,  454 
transcendental  n,  472,  515 
transformation 
linear,  42,  44 
linear  fractional,  1 60 
natural  268 
transitive,  600,  605 
transitive  group  action,  163 
transpose  of  matrix,  41 
transposition,  16 
triangle  inequality,  605 
triangular  form,  219 
triple  tensor  product,  277 
trisect  an  angle,  469,  472 
trivial  group,  1 1 8 
trivial  group  action,  1 6 1 
tuple  196 

two-sided  ideal,  145 

UFD1, 389, 419 
UFD2,  389 
union,  595 
disjoint,  198 
indexed,  597 
unipotent,  550 
unique  factorization,  5,  14 
of  fractional  ideal,  45 1 
of  ideal,  438 

unique  factorization  domain,  389 
unit,  1,  10 
in  a  ring,  143 
unit  vector,  93 
unital,  375 
unitary  group,  122 
unitary  linear  map,  103 
unitary  matrix,  103 
unitary  matrix  representation,  332 
unitary  representation,  332 
universal  enveloping  algebra,  301 
universal  mapping  property 


abstract,  200, 298 
of  Clifford  algebra,  302 
of  coproduct  in  a  category,  1 98 
of  direct  product  of  groups,  136,  137 
of  direct  product  of  vector  spaces,  63-64 
of  direct  sum  of  abelian  groups,  138-139, 
139-140 

of  direct  sum  of  vector  spaces,  60,  64-65 
of  exterior  algebra,  292 
of  field  of  fractions,  383 
of  free  group,  308 
of  free  R  module,  377 
of  group  algebra,  381 
of  integral  group  ring,  374 
of  localization,  43 1 
of  product  in  a  category,  1 96 
of  ring  of  polynomials,  150,  156-157 
of  S"(£),285 
of  symmetric  algebra,  285 
of  tensor  algebra,  282 
of  tensor  product  of  modules,  575 
of  tensor  product  of  vector  spaces,  263-264 
of  universal  enveloping  algebra,  301 
of  A"(£).  292 
of  Weyl  algebra,  303 
unknown,  19 
unordered  pair,  595 
upper  bound,  603,  605 

Van  Kampen  Theorem,  323 
Vandermonde  determinant,  71,  217 
Vandermonde  matrix,  71,217 
variable,  19 
corner,  2 1 
independent,  2 1 
vector,  34 

addition  for,  34 
column,  25 
cyclic,  244 
row,  25 

scalar  multiplication  for,  34 
unit,  93 

vector  product,  28 1 
vector  space,  34,  158 
associated  graded,  300 
basis  of,  36 

complex  conjugate  of,  115 
dimension  of,  37,  78 
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direct  product  for,  62,  63 
direct  sum  for,  59,  60,  61,  62,  64 
dual  of,  50 
filtered,  300 
finite-dimensional,  37 
graded,  300 

infinite-dimensional,  78 
left,  556 
quotient  of,  55 
vector  subspace,  35 
invariant,  218 
sum  for,  58 
volume,  86 

von  Neumann  set  theory,  594 
weight,  206 

weight  enumerator  polynomial,  209 
well  ordering,  605 


Wentzel,  473 

Weyl  algebra,  302 

Weyl  basis,  296 

Wilson’s  Theorem,  201,  539 

word,  307 

word  problem,  310 

for  finitely  presented  groups,  316 

for  free  groups,  310 

for  free  products,  325,  326 

Zassenhaus,  174,  561 

Zermelo-Fraenkel  set  theory,  593 

Zermelo’s  Well-Ordering  Theorem,  466,  609 

zero  divisor,  144 

zero  matrix,  25 

zero  polynomial,  10,  150 

zero  ring,  142 

Zorn’s  Lemma,  79,  385,  466, 468,  555,  605 


